Overview

Brought to you by YData

Dataset statistics

Number of variables107
Number of observations724508
Missing cells37354957
Missing cells (%)48.2%
Total size in memory576.9 MiB
Average record size in memory835.0 B

Variable types

Numeric20
Text83
Boolean4

Dataset

DescriptionNMNH Paleobiology Specimen Records (USNM) 0049391-241126133413365
URLhttps://doi.org/10.15468/dl.ws2uf3

Alerts

license has constant value "CC0_1_0" Constant
publisher has constant value "National Museum of Natural History, Smithsonian Institution" Constant
institutionID has constant value "http://biocol.org/urn:lsid:biocol.org:col:34871" Constant
collectionID has constant value "urn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac" Constant
institutionCode has constant value "USNM" Constant
collectionCode has constant value "PAL" Constant
datasetName has constant value "NMNH Paleobiology (USNM)" Constant
basisOfRecord has constant value "FOSSIL_SPECIMEN" Constant
occurrenceStatus has constant value "PRESENT" Constant
verbatimCoordinateSystem has constant value "Degrees Minutes Seconds" Constant
datasetKey has constant value "c8681cc2-9d0a-4c5f-b620-5c753abfe2bc" Constant
publishingCountry has constant value "US" Constant
typifiedName has constant value "Type" Constant
protocol has constant value "EML" Constant
lastCrawled has constant value "2024-12-02T10:02:33.848Z" Constant
isSequenced has constant value "False" Constant
publishedByGbifRegion has constant value "NORTH_AMERICA" Constant
hasGeospatialIssues is highly imbalanced (98.1%) Imbalance
catalogNumber has 50535 (7.0%) missing values Missing
recordNumber has 675939 (93.3%) missing values Missing
recordedBy has 563497 (77.8%) missing values Missing
preparations has 591600 (81.7%) missing values Missing
occurrenceRemarks has 638259 (88.1%) missing values Missing
fieldNumber has 720044 (99.4%) missing values Missing
eventDate has 474561 (65.5%) missing values Missing
startDayOfYear has 593923 (82.0%) missing values Missing
endDayOfYear has 593923 (82.0%) missing values Missing
year has 474684 (65.5%) missing values Missing
month has 572740 (79.1%) missing values Missing
day has 596444 (82.3%) missing values Missing
verbatimEventDate has 445814 (61.5%) missing values Missing
locationID has 335037 (46.2%) missing values Missing
higherGeography has 148417 (20.5%) missing values Missing
continent has 195168 (26.9%) missing values Missing
waterBody has 696851 (96.2%) missing values Missing
islandGroup has 723710 (99.9%) missing values Missing
island has 714401 (98.6%) missing values Missing
countryCode has 158422 (21.9%) missing values Missing
stateProvince has 226462 (31.3%) missing values Missing
county has 454433 (62.7%) missing values Missing
locality has 560871 (77.4%) missing values Missing
verbatimElevation has 724311 (> 99.9%) missing values Missing
verbatimDepth has 724424 (> 99.9%) missing values Missing
decimalLatitude has 620570 (85.7%) missing values Missing
decimalLongitude has 620570 (85.7%) missing values Missing
verbatimCoordinateSystem has 654265 (90.3%) missing values Missing
georeferenceProtocol has 695012 (95.9%) missing values Missing
georeferenceRemarks has 724503 (> 99.9%) missing values Missing
earliestEraOrLowestErathem has 220036 (30.4%) missing values Missing
latestEraOrHighestErathem has 718163 (99.1%) missing values Missing
earliestPeriodOrLowestSystem has 245750 (33.9%) missing values Missing
latestPeriodOrHighestSystem has 718167 (99.1%) missing values Missing
earliestEpochOrLowestSeries has 376914 (52.0%) missing values Missing
latestEpochOrHighestSeries has 718290 (99.1%) missing values Missing
earliestAgeOrLowestStage has 562472 (77.6%) missing values Missing
latestAgeOrHighestStage has 722133 (99.7%) missing values Missing
group has 633218 (87.4%) missing values Missing
formation has 365706 (50.5%) missing values Missing
member has 643191 (88.8%) missing values Missing
typeStatus has 582086 (80.3%) missing values Missing
identifiedBy has 521981 (72.0%) missing values Missing
acceptedNameUsageID has 171789 (23.7%) missing values Missing
higherClassification has 172643 (23.8%) missing values Missing
phylum has 192842 (26.6%) missing values Missing
class has 272566 (37.6%) missing values Missing
order has 369296 (51.0%) missing values Missing
family has 258765 (35.7%) missing values Missing
genus has 245070 (33.8%) missing values Missing
genericName has 244897 (33.8%) missing values Missing
specificEpithet has 449718 (62.1%) missing values Missing
infraspecificEpithet has 718207 (99.1%) missing values Missing
taxonomicStatus has 171789 (23.7%) missing values Missing
distanceFromCentroidInMeters has 723864 (99.9%) missing values Missing
mediaType has 637882 (88.0%) missing values Missing
acceptedTaxonKey has 171789 (23.7%) missing values Missing
phylumKey has 192842 (26.6%) missing values Missing
classKey has 272566 (37.6%) missing values Missing
orderKey has 369296 (51.0%) missing values Missing
familyKey has 258765 (35.7%) missing values Missing
genusKey has 245070 (33.8%) missing values Missing
speciesKey has 450165 (62.1%) missing values Missing
species has 450165 (62.1%) missing values Missing
acceptedScientificName has 171789 (23.7%) missing values Missing
verbatimScientificName has 171332 (23.6%) missing values Missing
typifiedName has 724501 (> 99.9%) missing values Missing
repatriated has 158317 (21.9%) missing values Missing
gbifRegion has 160612 (22.2%) missing values Missing
level0Gid has 686240 (94.7%) missing values Missing
level0Name has 686240 (94.7%) missing values Missing
level1Gid has 686243 (94.7%) missing values Missing
level1Name has 686243 (94.7%) missing values Missing
level2Gid has 687320 (94.9%) missing values Missing
level2Name has 687320 (94.9%) missing values Missing
level3Gid has 722506 (99.7%) missing values Missing
level3Name has 722506 (99.7%) missing values Missing
iucnRedListCategory has 365809 (50.5%) missing values Missing
individualCount is highly skewed (γ1 = 32.66226483) Skewed
gbifID has unique values Unique
occurrenceID has unique values Unique
taxonKey has 171789 (23.7%) zeros Zeros
kingdomKey has 171929 (23.7%) zeros Zeros

Reproduction

Analysis started2025-01-08 21:23:36.996405
Analysis finished2025-01-08 21:23:59.935414
Duration22.94 seconds
Software versionydata-profiling vv4.12.1
Download configurationconfig.json

Variables

gbifID
Real number (ℝ)

Unique 

Distinct724508
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1489761894
Minimum1316557246
Maximum4987259380
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:00.240562image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1316557246
5-th percentile1316593481
Q11316738419
median1316919584
Q31317100741
95-th percentile3311023845
Maximum4987259380
Range3670702134
Interquartile range (IQR)362322.5

Descriptive statistics

Standard deviation567530383.1
Coefficient of variation (CV)0.3809537521
Kurtosis11.81732068
Mean1489761894
Median Absolute Deviation (MAD)181161.5
Skewness3.474773969
Sum1.07934441 × 1015
Variance3.220907357 × 1017
MonotonicityNot monotonic
2025-01-08T16:24:00.312581image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1316557253 1
 
< 0.1%
1316984857 1
 
< 0.1%
1316984394 1
 
< 0.1%
3311030571 1
 
< 0.1%
1316984386 1
 
< 0.1%
1316984362 1
 
< 0.1%
1316984370 1
 
< 0.1%
1316984372 1
 
< 0.1%
1316984383 1
 
< 0.1%
1316984409 1
 
< 0.1%
Other values (724498) 724498
> 99.9%
ValueCountFrequency (%)
1316557246 1
< 0.1%
1316557247 1
< 0.1%
1316557248 1
< 0.1%
1316557249 1
< 0.1%
1316557250 1
< 0.1%
ValueCountFrequency (%)
4987259380 1
< 0.1%
4987259379 1
< 0.1%
4987259378 1
< 0.1%
4987259377 1
< 0.1%
4987259376 1
< 0.1%

license
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:00.391024image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters5071556
Distinct characters4
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCC0_1_0
2nd rowCC0_1_0
3rd rowCC0_1_0
4th rowCC0_1_0
5th rowCC0_1_0
ValueCountFrequency (%)
cc0_1_0 724508
100.0%
2025-01-08T16:24:00.631222image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 1449016
28.6%
0 1449016
28.6%
_ 1449016
28.6%
1 724508
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2173524
42.9%
Uppercase Letter 1449016
28.6%
Connector Punctuation 1449016
28.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1449016
66.7%
1 724508
33.3%
Uppercase Letter
ValueCountFrequency (%)
C 1449016
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1449016
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3622540
71.4%
Latin 1449016
 
28.6%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1449016
40.0%
_ 1449016
40.0%
1 724508
20.0%
Latin
ValueCountFrequency (%)
C 1449016
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5071556
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 1449016
28.6%
0 1449016
28.6%
_ 1449016
28.6%
1 724508
14.3%
Distinct6008
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:00.749030image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length20
Mean length20
Min length20

Characters and Unicode

Total characters14490160
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1783 ?
Unique (%)0.2%

Sample

1st row2014-11-25T18:32:00Z
2nd row2024-10-17T09:58:00Z
3rd row2024-10-17T10:44:00Z
4th row2024-08-03T21:41:00Z
5th row2024-10-17T10:17:00Z
ValueCountFrequency (%)
2024-08-03t22:06:00z 11077
 
1.5%
2024-08-03t22:09:00z 9194
 
1.3%
2024-08-03t22:08:00z 6946
 
1.0%
2024-11-18t11:29:00z 6500
 
0.9%
2024-11-18t11:28:00z 6488
 
0.9%
2024-10-17t10:55:00z 6364
 
0.9%
2024-10-17t10:57:00z 6355
 
0.9%
2024-10-17t10:29:00z 6348
 
0.9%
2024-10-17t10:28:00z 6344
 
0.9%
2024-10-17t10:56:00z 6343
 
0.9%
Other values (5998) 652549
90.1%
2025-01-08T16:24:00.926018image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 3567224
24.6%
1 2229486
15.4%
2 1840704
12.7%
- 1449016
10.0%
: 1449016
10.0%
4 856419
 
5.9%
T 724508
 
5.0%
Z 724508
 
5.0%
7 523431
 
3.6%
3 323301
 
2.2%
Other values (4) 802547
 
5.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10143112
70.0%
Dash Punctuation 1449016
 
10.0%
Other Punctuation 1449016
 
10.0%
Uppercase Letter 1449016
 
10.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 3567224
35.2%
1 2229486
22.0%
2 1840704
18.1%
4 856419
 
8.4%
7 523431
 
5.2%
3 323301
 
3.2%
8 267407
 
2.6%
5 251997
 
2.5%
9 156334
 
1.5%
6 126809
 
1.3%
Uppercase Letter
ValueCountFrequency (%)
T 724508
50.0%
Z 724508
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 1449016
100.0%
Other Punctuation
ValueCountFrequency (%)
: 1449016
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 13041144
90.0%
Latin 1449016
 
10.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 3567224
27.4%
1 2229486
17.1%
2 1840704
14.1%
- 1449016
11.1%
: 1449016
11.1%
4 856419
 
6.6%
7 523431
 
4.0%
3 323301
 
2.5%
8 267407
 
2.1%
5 251997
 
1.9%
Other values (2) 283143
 
2.2%
Latin
ValueCountFrequency (%)
T 724508
50.0%
Z 724508
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14490160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3567224
24.6%
1 2229486
15.4%
2 1840704
12.7%
- 1449016
10.0%
: 1449016
10.0%
4 856419
 
5.9%
T 724508
 
5.0%
Z 724508
 
5.0%
7 523431
 
3.6%
3 323301
 
2.2%
Other values (4) 802547
 
5.5%

publisher
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:01.026557image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length59
Median length59
Mean length59
Min length59

Characters and Unicode

Total characters42745972
Distinct characters21
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNational Museum of Natural History, Smithsonian Institution
2nd rowNational Museum of Natural History, Smithsonian Institution
3rd rowNational Museum of Natural History, Smithsonian Institution
4th rowNational Museum of Natural History, Smithsonian Institution
5th rowNational Museum of Natural History, Smithsonian Institution
ValueCountFrequency (%)
national 724508
14.3%
museum 724508
14.3%
of 724508
14.3%
natural 724508
14.3%
history 724508
14.3%
smithsonian 724508
14.3%
institution 724508
14.3%
2025-01-08T16:24:01.140830image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 5071556
11.9%
i 4347048
10.2%
4347048
10.2%
a 3622540
 
8.5%
o 3622540
 
8.5%
n 3622540
 
8.5%
s 2898032
 
6.8%
u 2898032
 
6.8%
r 1449016
 
3.4%
m 1449016
 
3.4%
Other values (11) 9418604
22.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 33327368
78.0%
Space Separator 4347048
 
10.2%
Uppercase Letter 4347048
 
10.2%
Other Punctuation 724508
 
1.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 5071556
15.2%
i 4347048
13.0%
a 3622540
10.9%
o 3622540
10.9%
n 3622540
10.9%
s 2898032
8.7%
u 2898032
8.7%
r 1449016
 
4.3%
m 1449016
 
4.3%
l 1449016
 
4.3%
Other values (4) 2898032
8.7%
Uppercase Letter
ValueCountFrequency (%)
N 1449016
33.3%
M 724508
16.7%
H 724508
16.7%
S 724508
16.7%
I 724508
16.7%
Space Separator
ValueCountFrequency (%)
4347048
100.0%
Other Punctuation
ValueCountFrequency (%)
, 724508
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 37674416
88.1%
Common 5071556
 
11.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 5071556
13.5%
i 4347048
11.5%
a 3622540
9.6%
o 3622540
9.6%
n 3622540
9.6%
s 2898032
 
7.7%
u 2898032
 
7.7%
r 1449016
 
3.8%
m 1449016
 
3.8%
N 1449016
 
3.8%
Other values (9) 7245080
19.2%
Common
ValueCountFrequency (%)
4347048
85.7%
, 724508
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 42745972
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 5071556
11.9%
i 4347048
10.2%
4347048
10.2%
a 3622540
 
8.5%
o 3622540
 
8.5%
n 3622540
 
8.5%
s 2898032
 
6.8%
u 2898032
 
6.8%
r 1449016
 
3.4%
m 1449016
 
3.4%
Other values (11) 9418604
22.0%

institutionID
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:01.199095image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length47
Median length47
Mean length47
Min length47

Characters and Unicode

Total characters34051876
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowhttp://biocol.org/urn:lsid:biocol.org:col:34871
2nd rowhttp://biocol.org/urn:lsid:biocol.org:col:34871
3rd rowhttp://biocol.org/urn:lsid:biocol.org:col:34871
4th rowhttp://biocol.org/urn:lsid:biocol.org:col:34871
5th rowhttp://biocol.org/urn:lsid:biocol.org:col:34871
ValueCountFrequency (%)
http://biocol.org/urn:lsid:biocol.org:col:34871 724508
100.0%
2025-01-08T16:24:01.309942image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 5071556
14.9%
: 3622540
 
10.6%
l 2898032
 
8.5%
r 2173524
 
6.4%
/ 2173524
 
6.4%
i 2173524
 
6.4%
c 2173524
 
6.4%
b 1449016
 
4.3%
. 1449016
 
4.3%
t 1449016
 
4.3%
Other values (12) 9418604
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 23184256
68.1%
Other Punctuation 7245080
 
21.3%
Decimal Number 3622540
 
10.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 5071556
21.9%
l 2898032
12.5%
r 2173524
9.4%
i 2173524
9.4%
c 2173524
9.4%
b 1449016
 
6.2%
t 1449016
 
6.2%
g 1449016
 
6.2%
d 724508
 
3.1%
h 724508
 
3.1%
Other values (4) 2898032
12.5%
Decimal Number
ValueCountFrequency (%)
7 724508
20.0%
8 724508
20.0%
4 724508
20.0%
3 724508
20.0%
1 724508
20.0%
Other Punctuation
ValueCountFrequency (%)
: 3622540
50.0%
/ 2173524
30.0%
. 1449016
 
20.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 23184256
68.1%
Common 10867620
31.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 5071556
21.9%
l 2898032
12.5%
r 2173524
9.4%
i 2173524
9.4%
c 2173524
9.4%
b 1449016
 
6.2%
t 1449016
 
6.2%
g 1449016
 
6.2%
d 724508
 
3.1%
h 724508
 
3.1%
Other values (4) 2898032
12.5%
Common
ValueCountFrequency (%)
: 3622540
33.3%
/ 2173524
20.0%
. 1449016
 
13.3%
7 724508
 
6.7%
8 724508
 
6.7%
4 724508
 
6.7%
3 724508
 
6.7%
1 724508
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 34051876
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 5071556
14.9%
: 3622540
 
10.6%
l 2898032
 
8.5%
r 2173524
 
6.4%
/ 2173524
 
6.4%
i 2173524
 
6.4%
c 2173524
 
6.4%
b 1449016
 
4.3%
. 1449016
 
4.3%
t 1449016
 
4.3%
Other values (12) 9418604
27.7%

collectionID
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:01.365896image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length44
Median length44
Mean length44
Min length44

Characters and Unicode

Total characters31878352
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowurn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac
2nd rowurn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac
3rd rowurn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac
4th rowurn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac
5th rowurn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac
ValueCountFrequency (%)
urn:uuid:ce595e88-ceba-42c0-a3ff-cd55b694fac 724508
100.0%
2025-01-08T16:24:01.475611image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
c 3622540
 
11.4%
- 2898032
 
9.1%
5 2898032
 
9.1%
u 2173524
 
6.8%
f 2173524
 
6.8%
a 2173524
 
6.8%
e 2173524
 
6.8%
4 1449016
 
4.5%
b 1449016
 
4.5%
8 1449016
 
4.5%
Other values (10) 9418604
29.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17388192
54.5%
Decimal Number 10143112
31.8%
Dash Punctuation 2898032
 
9.1%
Other Punctuation 1449016
 
4.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 3622540
20.8%
u 2173524
12.5%
f 2173524
12.5%
a 2173524
12.5%
e 2173524
12.5%
b 1449016
 
8.3%
d 1449016
 
8.3%
r 724508
 
4.2%
i 724508
 
4.2%
n 724508
 
4.2%
Decimal Number
ValueCountFrequency (%)
5 2898032
28.6%
4 1449016
14.3%
8 1449016
14.3%
9 1449016
14.3%
2 724508
 
7.1%
0 724508
 
7.1%
3 724508
 
7.1%
6 724508
 
7.1%
Dash Punctuation
ValueCountFrequency (%)
- 2898032
100.0%
Other Punctuation
ValueCountFrequency (%)
: 1449016
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 17388192
54.5%
Common 14490160
45.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
c 3622540
20.8%
u 2173524
12.5%
f 2173524
12.5%
a 2173524
12.5%
e 2173524
12.5%
b 1449016
 
8.3%
d 1449016
 
8.3%
r 724508
 
4.2%
i 724508
 
4.2%
n 724508
 
4.2%
Common
ValueCountFrequency (%)
- 2898032
20.0%
5 2898032
20.0%
4 1449016
10.0%
8 1449016
10.0%
9 1449016
10.0%
: 1449016
10.0%
2 724508
 
5.0%
0 724508
 
5.0%
3 724508
 
5.0%
6 724508
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 31878352
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c 3622540
 
11.4%
- 2898032
 
9.1%
5 2898032
 
9.1%
u 2173524
 
6.8%
f 2173524
 
6.8%
a 2173524
 
6.8%
e 2173524
 
6.8%
4 1449016
 
4.5%
b 1449016
 
4.5%
8 1449016
 
4.5%
Other values (10) 9418604
29.5%

institutionCode
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:01.518144image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters2898032
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUSNM
2nd rowUSNM
3rd rowUSNM
4th rowUSNM
5th rowUSNM
ValueCountFrequency (%)
usnm 724508
100.0%
2025-01-08T16:24:01.616230image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
U 724508
25.0%
S 724508
25.0%
N 724508
25.0%
M 724508
25.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2898032
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U 724508
25.0%
S 724508
25.0%
N 724508
25.0%
M 724508
25.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2898032
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
U 724508
25.0%
S 724508
25.0%
N 724508
25.0%
M 724508
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2898032
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U 724508
25.0%
S 724508
25.0%
N 724508
25.0%
M 724508
25.0%

collectionCode
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:01.658545image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2173524
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPAL
2nd rowPAL
3rd rowPAL
4th rowPAL
5th rowPAL
ValueCountFrequency (%)
pal 724508
100.0%
2025-01-08T16:24:01.756422image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
P 724508
33.3%
A 724508
33.3%
L 724508
33.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2173524
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P 724508
33.3%
A 724508
33.3%
L 724508
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 2173524
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
P 724508
33.3%
A 724508
33.3%
L 724508
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2173524
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
P 724508
33.3%
A 724508
33.3%
L 724508
33.3%

datasetName
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:01.806532image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length24
Median length24
Mean length24
Min length24

Characters and Unicode

Total characters17388192
Distinct characters17
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNMNH Paleobiology (USNM)
2nd rowNMNH Paleobiology (USNM)
3rd rowNMNH Paleobiology (USNM)
4th rowNMNH Paleobiology (USNM)
5th rowNMNH Paleobiology (USNM)
ValueCountFrequency (%)
nmnh 724508
33.3%
paleobiology 724508
33.3%
usnm 724508
33.3%
2025-01-08T16:24:01.914301image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 2173524
12.5%
o 2173524
12.5%
1449016
 
8.3%
l 1449016
 
8.3%
M 1449016
 
8.3%
H 724508
 
4.2%
P 724508
 
4.2%
a 724508
 
4.2%
e 724508
 
4.2%
b 724508
 
4.2%
Other values (7) 5071556
29.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7969588
45.8%
Uppercase Letter 6520572
37.5%
Space Separator 1449016
 
8.3%
Open Punctuation 724508
 
4.2%
Close Punctuation 724508
 
4.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 2173524
27.3%
l 1449016
18.2%
a 724508
 
9.1%
e 724508
 
9.1%
b 724508
 
9.1%
i 724508
 
9.1%
g 724508
 
9.1%
y 724508
 
9.1%
Uppercase Letter
ValueCountFrequency (%)
N 2173524
33.3%
M 1449016
22.2%
H 724508
 
11.1%
P 724508
 
11.1%
U 724508
 
11.1%
S 724508
 
11.1%
Space Separator
ValueCountFrequency (%)
1449016
100.0%
Open Punctuation
ValueCountFrequency (%)
( 724508
100.0%
Close Punctuation
ValueCountFrequency (%)
) 724508
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 14490160
83.3%
Common 2898032
 
16.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 2173524
15.0%
o 2173524
15.0%
l 1449016
10.0%
M 1449016
10.0%
H 724508
 
5.0%
P 724508
 
5.0%
a 724508
 
5.0%
e 724508
 
5.0%
b 724508
 
5.0%
i 724508
 
5.0%
Other values (4) 2898032
20.0%
Common
ValueCountFrequency (%)
1449016
50.0%
( 724508
25.0%
) 724508
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17388192
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 2173524
12.5%
o 2173524
12.5%
1449016
 
8.3%
l 1449016
 
8.3%
M 1449016
 
8.3%
H 724508
 
4.2%
P 724508
 
4.2%
a 724508
 
4.2%
e 724508
 
4.2%
b 724508
 
4.2%
Other values (7) 5071556
29.2%

basisOfRecord
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:01.964244image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length15
Median length15
Mean length15
Min length15

Characters and Unicode

Total characters10867620
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFOSSIL_SPECIMEN
2nd rowFOSSIL_SPECIMEN
3rd rowFOSSIL_SPECIMEN
4th rowFOSSIL_SPECIMEN
5th rowFOSSIL_SPECIMEN
ValueCountFrequency (%)
fossil_specimen 724508
100.0%
2025-01-08T16:24:02.072534image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 2173524
20.0%
I 1449016
13.3%
E 1449016
13.3%
F 724508
 
6.7%
O 724508
 
6.7%
L 724508
 
6.7%
_ 724508
 
6.7%
P 724508
 
6.7%
C 724508
 
6.7%
M 724508
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 10143112
93.3%
Connector Punctuation 724508
 
6.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 2173524
21.4%
I 1449016
14.3%
E 1449016
14.3%
F 724508
 
7.1%
O 724508
 
7.1%
L 724508
 
7.1%
P 724508
 
7.1%
C 724508
 
7.1%
M 724508
 
7.1%
N 724508
 
7.1%
Connector Punctuation
ValueCountFrequency (%)
_ 724508
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10143112
93.3%
Common 724508
 
6.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 2173524
21.4%
I 1449016
14.3%
E 1449016
14.3%
F 724508
 
7.1%
O 724508
 
7.1%
L 724508
 
7.1%
P 724508
 
7.1%
C 724508
 
7.1%
M 724508
 
7.1%
N 724508
 
7.1%
Common
ValueCountFrequency (%)
_ 724508
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10867620
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 2173524
20.0%
I 1449016
13.3%
E 1449016
13.3%
F 724508
 
6.7%
O 724508
 
6.7%
L 724508
 
6.7%
_ 724508
 
6.7%
P 724508
 
6.7%
C 724508
 
6.7%
M 724508
 
6.7%

occurrenceID
Text

Unique 

Distinct724508
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:02.461675image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length63
Median length63
Mean length63
Min length63

Characters and Unicode

Total characters45644004
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique724508 ?
Unique (%)100.0%

Sample

1st rowhttp://n2t.net/ark:/65665/300009e1e-4f3e-4240-b198-9ea1352b28b5
2nd rowhttp://n2t.net/ark:/65665/30000a59d-34e5-42b6-837d-ad1b89b6b930
3rd rowhttp://n2t.net/ark:/65665/3000109b9-b6d6-4ca0-8f0c-ddde53458300
4th rowhttp://n2t.net/ark:/65665/30001bcd8-61d5-492a-ad56-f8131f24bdaa
5th rowhttp://n2t.net/ark:/65665/300020a6b-970f-4e44-adb4-6d605be80b0d
ValueCountFrequency (%)
http://n2t.net/ark:/65665/300009e1e-4f3e-4240-b198-9ea1352b28b5 1
 
< 0.1%
http://n2t.net/ark:/65665/3004266bd-f222-4227-9817-5905ac4cbc57 1
 
< 0.1%
http://n2t.net/ark:/65665/30011b937-0eb9-4c75-bea7-c27393598b76 1
 
< 0.1%
http://n2t.net/ark:/65665/3002cb891-3b1b-49d8-84ee-8558aba9bf13 1
 
< 0.1%
http://n2t.net/ark:/65665/3000a6387-0469-4278-8ac0-fb0ac6fd37d6 1
 
< 0.1%
http://n2t.net/ark:/65665/3000109b9-b6d6-4ca0-8f0c-ddde53458300 1
 
< 0.1%
http://n2t.net/ark:/65665/30001bcd8-61d5-492a-ad56-f8131f24bdaa 1
 
< 0.1%
http://n2t.net/ark:/65665/300020a6b-970f-4e44-adb4-6d605be80b0d 1
 
< 0.1%
http://n2t.net/ark:/65665/300045523-2307-4a34-b888-fb51510870ad 1
 
< 0.1%
http://n2t.net/ark:/65665/300045db2-681e-481a-836e-3643bf3debbf 1
 
< 0.1%
Other values (724498) 724498
> 99.9%
2025-01-08T16:24:02.905266image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 3622540
 
7.9%
6 3531516
 
7.7%
- 2898032
 
6.3%
t 2898032
 
6.3%
5 2808306
 
6.2%
a 2263386
 
5.0%
e 2084462
 
4.6%
2 2083197
 
4.6%
3 2083153
 
4.6%
4 2081137
 
4.6%
Other values (16) 19290243
42.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 19743301
43.3%
Lowercase Letter 17206607
37.7%
Other Punctuation 5796064
 
12.7%
Dash Punctuation 2898032
 
6.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 2898032
16.8%
a 2263386
13.2%
e 2084462
12.1%
b 1539404
8.9%
n 1449016
8.4%
c 1358538
7.9%
d 1358025
7.9%
f 1357712
7.9%
k 724508
 
4.2%
r 724508
 
4.2%
Other values (2) 1449016
8.4%
Decimal Number
ValueCountFrequency (%)
6 3531516
17.9%
5 2808306
14.2%
2 2083197
10.6%
3 2083153
10.6%
4 2081137
10.5%
8 1539173
7.8%
9 1539102
7.8%
0 1359375
 
6.9%
7 1359374
 
6.9%
1 1358968
 
6.9%
Other Punctuation
ValueCountFrequency (%)
/ 3622540
62.5%
: 1449016
 
25.0%
. 724508
 
12.5%
Dash Punctuation
ValueCountFrequency (%)
- 2898032
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 28437397
62.3%
Latin 17206607
37.7%

Most frequent character per script

Common
ValueCountFrequency (%)
/ 3622540
12.7%
6 3531516
12.4%
- 2898032
10.2%
5 2808306
9.9%
2 2083197
7.3%
3 2083153
7.3%
4 2081137
7.3%
8 1539173
 
5.4%
9 1539102
 
5.4%
: 1449016
 
5.1%
Other values (4) 4802225
16.9%
Latin
ValueCountFrequency (%)
t 2898032
16.8%
a 2263386
13.2%
e 2084462
12.1%
b 1539404
8.9%
n 1449016
8.4%
c 1358538
7.9%
d 1358025
7.9%
f 1357712
7.9%
k 724508
 
4.2%
r 724508
 
4.2%
Other values (2) 1449016
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 45644004
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 3622540
 
7.9%
6 3531516
 
7.7%
- 2898032
 
6.3%
t 2898032
 
6.3%
5 2808306
 
6.2%
a 2263386
 
5.0%
e 2084462
 
4.6%
2 2083197
 
4.6%
3 2083153
 
4.6%
4 2081137
 
4.6%
Other values (16) 19290243
42.3%

catalogNumber
Text

Missing 

Distinct655081
Distinct (%)97.2%
Missing50535
Missing (%)7.0%
Memory size5.5 MiB
2025-01-08T16:24:03.371523image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length14
Mean length13.86868317
Min length7

Characters and Unicode

Total characters9347118
Distinct characters68
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique638257 ?
Unique (%)94.7%

Sample

1st rowUSNM SD38013 0000
2nd rowUSNM PAL706968
3rd rowUSNM PAL248638
4th rowUSNM PAL456768
5th rowUSNM PAL297724
ValueCountFrequency (%)
usnm 673973
47.8%
0000 59177
 
4.2%
0002 159
 
< 0.1%
0001 159
 
< 0.1%
0003 149
 
< 0.1%
0004 145
 
< 0.1%
0005 137
 
< 0.1%
0006 116
 
< 0.1%
0007 113
 
< 0.1%
0008 105
 
< 0.1%
Other values (652937) 674632
47.9%
2025-01-08T16:24:03.887088image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 742844
 
7.9%
734892
 
7.9%
M 712585
 
7.6%
N 674519
 
7.2%
U 674214
 
7.2%
0 557394
 
6.0%
P 521957
 
5.6%
A 511374
 
5.5%
L 497601
 
5.3%
1 444334
 
4.8%
Other values (58) 3275404
35.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4546936
48.6%
Decimal Number 4063828
43.5%
Space Separator 734892
 
7.9%
Other Punctuation 741
 
< 0.1%
Lowercase Letter 690
 
< 0.1%
Dash Punctuation 30
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 742844
16.3%
M 712585
15.7%
N 674519
14.8%
U 674214
14.8%
P 521957
11.5%
A 511374
11.2%
L 497601
10.9%
D 65264
 
1.4%
C 43992
 
1.0%
O 38427
 
0.8%
Other values (16) 64159
 
1.4%
Lowercase Letter
ValueCountFrequency (%)
a 130
18.8%
b 126
18.3%
d 61
8.8%
e 54
7.8%
c 50
 
7.2%
o 38
 
5.5%
l 31
 
4.5%
f 27
 
3.9%
r 26
 
3.8%
k 23
 
3.3%
Other values (16) 124
18.0%
Decimal Number
ValueCountFrequency (%)
0 557394
13.7%
1 444334
10.9%
3 432709
10.6%
5 423320
10.4%
2 419515
10.3%
4 412173
10.1%
6 395612
9.7%
7 350867
8.6%
8 318934
7.8%
9 308970
7.6%
Other Punctuation
ValueCountFrequency (%)
' 704
95.0%
" 35
 
4.7%
, 2
 
0.3%
Space Separator
ValueCountFrequency (%)
734892
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 30
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4799492
51.3%
Latin 4547626
48.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 742844
16.3%
M 712585
15.7%
N 674519
14.8%
U 674214
14.8%
P 521957
11.5%
A 511374
11.2%
L 497601
10.9%
D 65264
 
1.4%
C 43992
 
1.0%
O 38427
 
0.8%
Other values (42) 64849
 
1.4%
Common
ValueCountFrequency (%)
734892
15.3%
0 557394
11.6%
1 444334
9.3%
3 432709
9.0%
5 423320
8.8%
2 419515
8.7%
4 412173
8.6%
6 395612
8.2%
7 350867
7.3%
8 318934
6.6%
Other values (6) 309742
6.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9347118
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 742844
 
7.9%
734892
 
7.9%
M 712585
 
7.6%
N 674519
 
7.2%
U 674214
 
7.2%
0 557394
 
6.0%
P 521957
 
5.6%
A 511374
 
5.5%
L 497601
 
5.3%
1 444334
 
4.8%
Other values (58) 3275404
35.0%

recordNumber
Text

Missing 

Distinct39872
Distinct (%)82.1%
Missing675939
Missing (%)93.3%
Memory size5.5 MiB
2025-01-08T16:24:04.075266image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length48
Median length5
Mean length6.205336737
Min length1

Characters and Unicode

Total characters301387
Distinct characters77
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique37721 ?
Unique (%)77.7%

Sample

1st rowPALMER LOC 1479
2nd row75432
3rd rowH-11
4th rowE73-59
5th rowGaxin Loc 178-36
ValueCountFrequency (%)
loc 1685
 
2.9%
emlong 951
 
1.7%
urbac 803
 
1.4%
olson 263
 
0.5%
sample 209
 
0.4%
hass 177
 
0.3%
rb 171
 
0.3%
c-29 169
 
0.3%
gibson 163
 
0.3%
wyo 162
 
0.3%
Other values (38506) 52476
91.7%
2025-01-08T16:24:04.334462image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 30021
 
10.0%
5 27939
 
9.3%
7 23690
 
7.9%
2 21570
 
7.2%
3 20657
 
6.9%
6 18998
 
6.3%
8 18791
 
6.2%
0 17388
 
5.8%
4 17006
 
5.6%
- 16559
 
5.5%
Other values (67) 88768
29.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 211386
70.1%
Uppercase Letter 58763
 
19.5%
Dash Punctuation 16559
 
5.5%
Space Separator 8660
 
2.9%
Other Punctuation 3199
 
1.1%
Lowercase Letter 2471
 
0.8%
Math Symbol 145
 
< 0.1%
Close Punctuation 102
 
< 0.1%
Open Punctuation 101
 
< 0.1%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
O 5593
 
9.5%
E 4986
 
8.5%
L 4981
 
8.5%
C 4891
 
8.3%
S 4262
 
7.3%
A 4151
 
7.1%
M 3190
 
5.4%
R 3078
 
5.2%
N 3020
 
5.1%
B 2373
 
4.0%
Other values (16) 18238
31.0%
Lowercase Letter
ValueCountFrequency (%)
o 425
17.2%
n 315
12.7%
a 217
8.8%
y 190
7.7%
l 189
7.6%
c 189
7.6%
e 172
7.0%
i 169
 
6.8%
r 167
 
6.8%
t 82
 
3.3%
Other values (14) 356
14.4%
Decimal Number
ValueCountFrequency (%)
1 30021
14.2%
5 27939
13.2%
7 23690
11.2%
2 21570
10.2%
3 20657
9.8%
6 18998
9.0%
8 18791
8.9%
0 17388
8.2%
4 17006
8.0%
9 15326
7.3%
Other Punctuation
ValueCountFrequency (%)
/ 1630
51.0%
. 955
29.9%
, 516
 
16.1%
? 56
 
1.8%
' 22
 
0.7%
; 12
 
0.4%
# 5
 
0.2%
: 2
 
0.1%
& 1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+ 135
93.1%
= 10
 
6.9%
Close Punctuation
ValueCountFrequency (%)
) 100
98.0%
} 2
 
2.0%
Dash Punctuation
ValueCountFrequency (%)
- 16559
100.0%
Space Separator
ValueCountFrequency (%)
8660
100.0%
Open Punctuation
ValueCountFrequency (%)
( 101
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 240153
79.7%
Latin 61234
 
20.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
O 5593
 
9.1%
E 4986
 
8.1%
L 4981
 
8.1%
C 4891
 
8.0%
S 4262
 
7.0%
A 4151
 
6.8%
M 3190
 
5.2%
R 3078
 
5.0%
N 3020
 
4.9%
B 2373
 
3.9%
Other values (40) 20709
33.8%
Common
ValueCountFrequency (%)
1 30021
12.5%
5 27939
11.6%
7 23690
9.9%
2 21570
9.0%
3 20657
8.6%
6 18998
7.9%
8 18791
7.8%
0 17388
7.2%
4 17006
7.1%
- 16559
6.9%
Other values (17) 27534
11.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 301387
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 30021
 
10.0%
5 27939
 
9.3%
7 23690
 
7.9%
2 21570
 
7.2%
3 20657
 
6.9%
6 18998
 
6.3%
8 18791
 
6.2%
0 17388
 
5.8%
4 17006
 
5.6%
- 16559
 
5.5%
Other values (67) 88768
29.5%

recordedBy
Text

Missing 

Distinct3957
Distinct (%)2.5%
Missing563497
Missing (%)77.8%
Memory size5.5 MiB
2025-01-08T16:24:04.523286image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length119
Median length61
Mean length10.93147052
Min length1

Characters and Unicode

Total characters1760087
Distinct characters61
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1329 ?
Unique (%)0.8%

Sample

1st rowR. Snow
2nd rowD. Palmer
3rd rowW. Woodring & L. Lupher
4th rowJames
5th rowRoss
ValueCountFrequency (%)
21228
 
6.1%
j 19727
 
5.7%
r 15376
 
4.5%
w 14249
 
4.1%
a 12060
 
3.5%
james 11468
 
3.3%
l 10757
 
3.1%
woodring 9356
 
2.7%
pribyl 8943
 
2.6%
c 7362
 
2.1%
Other values (2560) 214833
62.2%
2025-01-08T16:24:04.786376image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
184348
 
10.5%
e 133592
 
7.6%
. 131492
 
7.5%
r 102132
 
5.8%
o 91217
 
5.2%
l 89319
 
5.1%
n 89079
 
5.1%
a 84651
 
4.8%
i 80231
 
4.6%
s 70452
 
4.0%
Other values (51) 703574
40.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1075097
61.1%
Uppercase Letter 337569
 
19.2%
Space Separator 184348
 
10.5%
Other Punctuation 160539
 
9.1%
Dash Punctuation 2462
 
0.1%
Open Punctuation 36
 
< 0.1%
Close Punctuation 36
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 133592
12.4%
r 102132
9.5%
o 91217
 
8.5%
l 89319
 
8.3%
n 89079
 
8.3%
a 84651
 
7.9%
i 80231
 
7.5%
s 70452
 
6.6%
t 48464
 
4.5%
d 48173
 
4.5%
Other values (18) 237787
22.1%
Uppercase Letter
ValueCountFrequency (%)
J 36000
 
10.7%
W 33626
 
10.0%
A 27177
 
8.1%
R 24357
 
7.2%
P 20822
 
6.2%
C 20595
 
6.1%
M 19813
 
5.9%
S 19479
 
5.8%
L 18797
 
5.6%
H 15162
 
4.5%
Other values (15) 101741
30.1%
Other Punctuation
ValueCountFrequency (%)
. 131492
81.9%
& 21228
 
13.2%
, 7789
 
4.9%
' 30
 
< 0.1%
Space Separator
ValueCountFrequency (%)
184348
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2462
100.0%
Open Punctuation
ValueCountFrequency (%)
( 36
100.0%
Close Punctuation
ValueCountFrequency (%)
) 36
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1412666
80.3%
Common 347421
 
19.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 133592
 
9.5%
r 102132
 
7.2%
o 91217
 
6.5%
l 89319
 
6.3%
n 89079
 
6.3%
a 84651
 
6.0%
i 80231
 
5.7%
s 70452
 
5.0%
t 48464
 
3.4%
d 48173
 
3.4%
Other values (43) 575356
40.7%
Common
ValueCountFrequency (%)
184348
53.1%
. 131492
37.8%
& 21228
 
6.1%
, 7789
 
2.2%
- 2462
 
0.7%
( 36
 
< 0.1%
) 36
 
< 0.1%
' 30
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1760046
> 99.9%
None 41
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
184348
 
10.5%
e 133592
 
7.6%
. 131492
 
7.5%
r 102132
 
5.8%
o 91217
 
5.2%
l 89319
 
5.1%
n 89079
 
5.1%
a 84651
 
4.8%
i 80231
 
4.6%
s 70452
 
4.0%
Other values (49) 703533
40.0%
None
ValueCountFrequency (%)
ú 40
97.6%
č 1
 
2.4%

individualCount
Real number (ℝ)

Skewed 

Distinct686
Distinct (%)0.1%
Missing303
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean11.84197706
Minimum0
Maximum15000
Zeros158
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:04.865317image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile16
Maximum15000
Range15000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation133.8531974
Coefficient of variation (CV)11.30328126
Kurtosis1553.85833
Mean11.84197706
Median Absolute Deviation (MAD)0
Skewness32.66226483
Sum8576019
Variance17916.67846
MonotonicityNot monotonic
2025-01-08T16:24:04.930322image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 594864
82.1%
2 29629
 
4.1%
3 14673
 
2.0%
4 9858
 
1.4%
5 7420
 
1.0%
6 5780
 
0.8%
7 4510
 
0.6%
8 3695
 
0.5%
10 3151
 
0.4%
9 3129
 
0.4%
Other values (676) 47496
 
6.6%
ValueCountFrequency (%)
0 158
 
< 0.1%
1 594864
82.1%
2 29629
 
4.1%
3 14673
 
2.0%
4 9858
 
1.4%
ValueCountFrequency (%)
15000 1
 
< 0.1%
9999 2
 
< 0.1%
9942 1
 
< 0.1%
9000 8
< 0.1%
8000 5
< 0.1%

occurrenceStatus
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:04.970853image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters5071556
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPRESENT
2nd rowPRESENT
3rd rowPRESENT
4th rowPRESENT
5th rowPRESENT
ValueCountFrequency (%)
present 724508
100.0%
2025-01-08T16:24:05.065778image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 1449016
28.6%
P 724508
14.3%
R 724508
14.3%
S 724508
14.3%
N 724508
14.3%
T 724508
14.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5071556
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 1449016
28.6%
P 724508
14.3%
R 724508
14.3%
S 724508
14.3%
N 724508
14.3%
T 724508
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 5071556
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 1449016
28.6%
P 724508
14.3%
R 724508
14.3%
S 724508
14.3%
N 724508
14.3%
T 724508
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5071556
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 1449016
28.6%
P 724508
14.3%
R 724508
14.3%
S 724508
14.3%
N 724508
14.3%
T 724508
14.3%

preparations
Text

Missing 

Distinct381
Distinct (%)0.3%
Missing591600
Missing (%)81.7%
Memory size5.5 MiB
2025-01-08T16:24:05.139807image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length94
Median length91
Mean length16.14684594
Min length3

Characters and Unicode

Total characters2146045
Distinct characters51
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique130 ?
Unique (%)0.1%

Sample

1st rowBoxes and vials
2nd rowThin sections
3rd rowSecondary microslides
4th rowWet
5th rowplastic container
ValueCountFrequency (%)
microslide 45697
17.5%
microslides 34837
13.4%
secondary 33230
12.8%
remnants 26629
10.2%
thin 24547
9.4%
sections 24011
9.2%
no 15071
 
5.8%
with 10919
 
4.2%
unsectioned 9109
 
3.5%
bottle 3934
 
1.5%
Other values (53) 32636
12.5%
2025-01-08T16:24:05.288598image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 236706
11.0%
s 211809
9.9%
e 210870
9.8%
n 172401
 
8.0%
o 167894
 
7.8%
c 147453
 
6.9%
r 146905
 
6.8%
d 130804
 
6.1%
127712
 
6.0%
l 92477
 
4.3%
Other values (41) 501014
23.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1849130
86.2%
Uppercase Letter 159097
 
7.4%
Space Separator 127712
 
6.0%
Other Punctuation 10096
 
0.5%
Open Punctuation 5
 
< 0.1%
Close Punctuation 5
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 236706
12.8%
s 211809
11.5%
e 210870
11.4%
n 172401
9.3%
o 167894
9.1%
c 147453
8.0%
r 146905
7.9%
d 130804
7.1%
l 92477
 
5.0%
t 85481
 
4.6%
Other values (14) 246330
13.3%
Uppercase Letter
ValueCountFrequency (%)
M 46146
29.0%
S 38065
23.9%
T 27401
17.2%
U 10261
 
6.4%
B 6095
 
3.8%
P 5926
 
3.7%
C 5880
 
3.7%
O 5094
 
3.2%
E 3082
 
1.9%
R 2197
 
1.4%
Other values (11) 8950
 
5.6%
Other Punctuation
ValueCountFrequency (%)
; 9850
97.6%
& 157
 
1.6%
/ 89
 
0.9%
Space Separator
ValueCountFrequency (%)
127712
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2008227
93.6%
Common 137818
 
6.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 236706
11.8%
s 211809
10.5%
e 210870
10.5%
n 172401
8.6%
o 167894
8.4%
c 147453
 
7.3%
r 146905
 
7.3%
d 130804
 
6.5%
l 92477
 
4.6%
t 85481
 
4.3%
Other values (35) 405427
20.2%
Common
ValueCountFrequency (%)
127712
92.7%
; 9850
 
7.1%
& 157
 
0.1%
/ 89
 
0.1%
( 5
 
< 0.1%
) 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2146045
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 236706
11.0%
s 211809
9.9%
e 210870
9.8%
n 172401
 
8.0%
o 167894
 
7.8%
c 147453
 
6.9%
r 146905
 
6.8%
d 130804
 
6.1%
127712
 
6.0%
l 92477
 
4.3%
Other values (41) 501014
23.3%

occurrenceRemarks
Text

Missing 

Distinct38195
Distinct (%)44.3%
Missing638259
Missing (%)88.1%
Memory size5.5 MiB
2025-01-08T16:24:05.487874image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length1257
Median length1240
Mean length357.4557966
Min length5

Characters and Unicode

Total characters30830205
Distinct characters92
Distinct categories13 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique36384 ?
Unique (%)42.2%

Sample

1st rowSpecimen comments: Associated w/ #0343 and #0346. | Body size code: medium; Taphonomic Significance: Human modification | Features: Weathering, diagenesis: N/A; Burn Color: none; Burn Modification: none; Cut: 0; Scrape: 0; Chop: 0; Loading Notch: 0; Counterblow: 0; Anvil pit: 0; Carn pit: 0; Carn score: 0; Carn furrow: 0; Carn punct: 0; Carn crenulation: 0; Rodent gnaw: none
2nd rowEMu record was created as part of the Smithsonian Institution Digitization Program Office (SI DPO) mass digitization pilot project to support the National Science Foundation Advancing Digitization of Biodiversity Collections Eastern Pacific Invertebrates of the Cenozoic Collaborative Thematic Collections Network (NSF ADBC EPICC TCN). The SI DPO mass digitization pilot workflow includes crowdsourced label transcription through the SI Transcription Center.; Information generated by NMNH Department of Paleobiology volunteers: Specimen count and preliminary identification to class.
3rd rowEMu record was created as part of the Smithsonian Institution Digitization Program Office (SI DPO) mass digitization pilot project to support the National Science Foundation Advancing Digitization of Biodiversity Collections Eastern Pacific Invertebrates of the Cenozoic Collaborative Thematic Collections Network (NSF ADBC EPICC TCN). The SI DPO mass digitization pilot workflow includes crowdsourced label transcription through the SI Transcription Center.; Information generated by NMNH Department of Paleobiology volunteers: Specimen count and preliminary identification to class.
4th rowThe fossil is marked with the original Green River number and is often mistaken for the USNM number. That original Green River collection number is 75432.; Numbers associated with this fossil: 578683. 75432. 40193.
5th rowEMu record was created as part of the Smithsonian Institution Digitization Program Office (SI DPO) mass digitization pilot project to support the National Science Foundation Advancing Digitization of Biodiversity Collections Eastern Pacific Invertebrates of the Cenozoic Collaborative Thematic Collections Network (NSF ADBC EPICC TCN). The SI DPO mass digitization pilot workflow includes crowdsourced label transcription through the SI Transcription Center.; Additional label information: This locality is at approximately the same horizon as USGS CENO LOC 5686, in which a shale fauna was collected | See USGS CENO LOC 5703; Verbatim Lithostratigraphy: Tejon Formation; Sandstone forming the upper member of the Tejon | Discontinuous lenses in a soft brownish sandstone, less than 100 feet stratigraphically below the overlying diatomaceous shale; Verbatim Chronostratigraphy: Eocene
ValueCountFrequency (%)
the 291111
 
6.9%
digitization 174338
 
4.1%
of 164357
 
3.9%
si 100203
 
2.4%
collections 99405
 
2.4%
number 86263
 
2.0%
is 85833
 
2.0%
mass 74949
 
1.8%
dpo 74947
 
1.8%
with 57325
 
1.4%
Other values (66970) 3009589
71.3%
2025-01-08T16:24:05.774471image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4132071
 
13.4%
i 2608470
 
8.5%
t 2311910
 
7.5%
o 2139574
 
6.9%
e 2129723
 
6.9%
n 1708168
 
5.5%
a 1671073
 
5.4%
r 1554155
 
5.0%
s 1249854
 
4.1%
c 981043
 
3.2%
Other values (82) 10344164
33.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 22179429
71.9%
Space Separator 4132071
 
13.4%
Uppercase Letter 3027854
 
9.8%
Decimal Number 712264
 
2.3%
Other Punctuation 536260
 
1.7%
Open Punctuation 103223
 
0.3%
Close Punctuation 103221
 
0.3%
Math Symbol 26815
 
0.1%
Dash Punctuation 8726
 
< 0.1%
Connector Punctuation 335
 
< 0.1%
Other values (3) 7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 2608470
11.8%
t 2311910
10.4%
o 2139574
9.6%
e 2129723
9.6%
n 1708168
 
7.7%
a 1671073
 
7.5%
r 1554155
 
7.0%
s 1249854
 
5.6%
c 981043
 
4.4%
l 809850
 
3.7%
Other values (16) 5015609
22.6%
Uppercase Letter
ValueCountFrequency (%)
C 475177
15.7%
S 312569
10.3%
N 284886
9.4%
I 260808
8.6%
P 248493
8.2%
D 239558
7.9%
T 217566
 
7.2%
E 157599
 
5.2%
A 134747
 
4.5%
O 129263
 
4.3%
Other values (16) 567188
18.7%
Other Punctuation
ValueCountFrequency (%)
. 253963
47.4%
: 134709
25.1%
; 123326
23.0%
, 10668
 
2.0%
/ 5315
 
1.0%
& 3632
 
0.7%
? 1748
 
0.3%
" 1387
 
0.3%
# 984
 
0.2%
' 412
 
0.1%
Other values (5) 116
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 96673
13.6%
5 95617
13.4%
0 89759
12.6%
4 70754
9.9%
2 67002
9.4%
7 66254
9.3%
8 64489
9.1%
6 57819
8.1%
3 52279
7.3%
9 51618
7.2%
Math Symbol
ValueCountFrequency (%)
| 24725
92.2%
+ 1585
 
5.9%
> 212
 
0.8%
< 199
 
0.7%
= 94
 
0.4%
Open Punctuation
ValueCountFrequency (%)
( 103206
> 99.9%
[ 17
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 103204
> 99.9%
] 17
 
< 0.1%
Space Separator
ValueCountFrequency (%)
4132071
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8726
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 335
100.0%
Initial Punctuation
ValueCountFrequency (%)
4
100.0%
Final Punctuation
ValueCountFrequency (%)
2
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 25207283
81.8%
Common 5622922
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 2608470
 
10.3%
t 2311910
 
9.2%
o 2139574
 
8.5%
e 2129723
 
8.4%
n 1708168
 
6.8%
a 1671073
 
6.6%
r 1554155
 
6.2%
s 1249854
 
5.0%
c 981043
 
3.9%
l 809850
 
3.2%
Other values (42) 8043463
31.9%
Common
ValueCountFrequency (%)
4132071
73.5%
. 253963
 
4.5%
: 134709
 
2.4%
; 123326
 
2.2%
( 103206
 
1.8%
) 103204
 
1.8%
1 96673
 
1.7%
5 95617
 
1.7%
0 89759
 
1.6%
4 70754
 
1.3%
Other values (30) 419640
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 30830198
> 99.9%
Punctuation 7
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4132071
 
13.4%
i 2608470
 
8.5%
t 2311910
 
7.5%
o 2139574
 
6.9%
e 2129723
 
6.9%
n 1708168
 
5.5%
a 1671073
 
5.4%
r 1554155
 
5.0%
s 1249854
 
4.1%
c 981043
 
3.2%
Other values (79) 10344157
33.6%
Punctuation
ValueCountFrequency (%)
4
57.1%
2
28.6%
1
 
14.3%

fieldNumber
Text

Missing 

Distinct1516
Distinct (%)34.0%
Missing720044
Missing (%)99.4%
Memory size5.5 MiB
2025-01-08T16:24:05.969645image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length209
Median length45
Mean length35.25537634
Min length1

Characters and Unicode

Total characters157380
Distinct characters72
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1229 ?
Unique (%)27.5%

Sample

1st rowMTC-08009; MTC-08009B; MTC-08009B (A); MTC-08009B (B)
2nd row217
3rd rowYP79-2
4th rowTDP31
5th row82-10; 82-19; 82-21; 82-22; 82-4; 82-6; 82-7
ValueCountFrequency (%)
82-10 767
 
4.2%
82-21 767
 
4.2%
82-22 767
 
4.2%
82-4 767
 
4.2%
82-6 767
 
4.2%
82-7 767
 
4.2%
82-19 767
 
4.2%
mtc-04028dd 329
 
1.8%
mtc-04028h 329
 
1.8%
mtc-04028gg 329
 
1.8%
Other values (1502) 11759
64.9%
2025-01-08T16:24:06.225115image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18832
12.0%
- 15944
10.1%
2 14513
9.2%
13651
 
8.7%
; 12694
 
8.1%
8 11928
 
7.6%
C 9870
 
6.3%
M 9201
 
5.8%
T 8674
 
5.5%
4 7381
 
4.7%
Other values (62) 34692
22.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 72021
45.8%
Uppercase Letter 40992
26.0%
Dash Punctuation 15944
 
10.1%
Space Separator 13651
 
8.7%
Other Punctuation 12856
 
8.2%
Lowercase Letter 1716
 
1.1%
Close Punctuation 100
 
0.1%
Open Punctuation 100
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 290
16.9%
a 205
11.9%
m 201
11.7%
e 185
10.8%
l 159
9.3%
p 150
8.7%
o 130
7.6%
t 77
 
4.5%
r 70
 
4.1%
i 55
 
3.2%
Other values (16) 194
11.3%
Uppercase Letter
ValueCountFrequency (%)
C 9870
24.1%
M 9201
22.4%
T 8674
21.2%
A 1535
 
3.7%
G 1513
 
3.7%
B 1509
 
3.7%
E 1291
 
3.1%
D 1285
 
3.1%
F 1161
 
2.8%
H 1137
 
2.8%
Other values (15) 3816
 
9.3%
Decimal Number
ValueCountFrequency (%)
0 18832
26.1%
2 14513
20.2%
8 11928
16.6%
4 7381
 
10.2%
1 6730
 
9.3%
3 3699
 
5.1%
5 3595
 
5.0%
7 2000
 
2.8%
9 1780
 
2.5%
6 1563
 
2.2%
Other Punctuation
ValueCountFrequency (%)
; 12694
98.7%
. 62
 
0.5%
, 49
 
0.4%
# 34
 
0.3%
/ 10
 
0.1%
& 4
 
< 0.1%
' 3
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 15944
100.0%
Space Separator
ValueCountFrequency (%)
13651
100.0%
Close Punctuation
ValueCountFrequency (%)
) 100
100.0%
Open Punctuation
ValueCountFrequency (%)
( 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 114672
72.9%
Latin 42708
 
27.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 9870
23.1%
M 9201
21.5%
T 8674
20.3%
A 1535
 
3.6%
G 1513
 
3.5%
B 1509
 
3.5%
E 1291
 
3.0%
D 1285
 
3.0%
F 1161
 
2.7%
H 1137
 
2.7%
Other values (41) 5532
13.0%
Common
ValueCountFrequency (%)
0 18832
16.4%
- 15944
13.9%
2 14513
12.7%
13651
11.9%
; 12694
11.1%
8 11928
10.4%
4 7381
 
6.4%
1 6730
 
5.9%
3 3699
 
3.2%
5 3595
 
3.1%
Other values (11) 5705
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 157380
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18832
12.0%
- 15944
10.1%
2 14513
9.2%
13651
 
8.7%
; 12694
 
8.1%
8 11928
 
7.6%
C 9870
 
6.3%
M 9201
 
5.8%
T 8674
 
5.5%
4 7381
 
4.7%
Other values (62) 34692
22.0%

eventDate
Text

Missing 

Distinct17205
Distinct (%)6.9%
Missing474561
Missing (%)65.5%
Memory size5.5 MiB
2025-01-08T16:24:06.425048image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length10
Mean length7.503406722
Min length4

Characters and Unicode

Total characters1875454
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5657 ?
Unique (%)2.3%

Sample

1st row1985-01-23
2nd row1974
3rd row1980
4th row1963
5th row1956
ValueCountFrequency (%)
1999 3773
 
1.5%
1980 3743
 
1.5%
1982 3572
 
1.4%
1984-02 3350
 
1.3%
1998 3320
 
1.3%
1997 3308
 
1.3%
1995 3121
 
1.2%
2001 2935
 
1.2%
1974 2850
 
1.1%
1971 2519
 
1.0%
Other values (17195) 217456
87.0%
2025-01-08T16:24:06.685019image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 389845
20.8%
9 321056
17.1%
- 287687
15.3%
0 240421
12.8%
8 128673
 
6.9%
7 115893
 
6.2%
2 104754
 
5.6%
6 84492
 
4.5%
4 67249
 
3.6%
5 66574
 
3.5%
Other values (2) 68810
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1585246
84.5%
Dash Punctuation 287687
 
15.3%
Other Punctuation 2521
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 389845
24.6%
9 321056
20.3%
0 240421
15.2%
8 128673
 
8.1%
7 115893
 
7.3%
2 104754
 
6.6%
6 84492
 
5.3%
4 67249
 
4.2%
5 66574
 
4.2%
3 66289
 
4.2%
Dash Punctuation
ValueCountFrequency (%)
- 287687
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 2521
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1875454
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 389845
20.8%
9 321056
17.1%
- 287687
15.3%
0 240421
12.8%
8 128673
 
6.9%
7 115893
 
6.2%
2 104754
 
5.6%
6 84492
 
4.5%
4 67249
 
3.6%
5 66574
 
3.5%
Other values (2) 68810
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1875454
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 389845
20.8%
9 321056
17.1%
- 287687
15.3%
0 240421
12.8%
8 128673
 
6.9%
7 115893
 
6.2%
2 104754
 
5.6%
6 84492
 
4.5%
4 67249
 
3.6%
5 66574
 
3.5%
Other values (2) 68810
 
3.7%

startDayOfYear
Real number (ℝ)

Missing 

Distinct366
Distinct (%)0.3%
Missing593923
Missing (%)82.0%
Infinite0
Infinite (%)0.0%
Mean192.2771605
Minimum1
Maximum366
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:06.764301image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile48
Q1137
median201
Q3248
95-th percentile310
Maximum366
Range365
Interquartile range (IQR)111

Descriptive statistics

Standard deviation78.76365518
Coefficient of variation (CV)0.409636043
Kurtosis-0.5202115862
Mean192.2771605
Median Absolute Deviation (MAD)56
Skewness-0.3074105143
Sum25108513
Variance6203.713378
MonotonicityNot monotonic
2025-01-08T16:24:06.830132image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
198 1192
 
0.2%
191 1146
 
0.2%
195 1139
 
0.2%
223 1099
 
0.2%
196 1078
 
0.1%
194 1065
 
0.1%
251 1041
 
0.1%
138 995
 
0.1%
137 971
 
0.1%
136 949
 
0.1%
Other values (356) 119910
 
16.6%
(Missing) 593923
82.0%
ValueCountFrequency (%)
1 20
 
< 0.1%
2 59
 
< 0.1%
3 24
 
< 0.1%
4 125
< 0.1%
5 150
< 0.1%
ValueCountFrequency (%)
366 8
 
< 0.1%
365 19
< 0.1%
364 22
< 0.1%
363 29
< 0.1%
362 8
 
< 0.1%

endDayOfYear
Real number (ℝ)

Missing 

Distinct366
Distinct (%)0.3%
Missing593923
Missing (%)82.0%
Infinite0
Infinite (%)0.0%
Mean192.4239844
Minimum1
Maximum366
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:06.895673image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile49
Q1137
median201
Q3248
95-th percentile310
Maximum366
Range365
Interquartile range (IQR)111

Descriptive statistics

Standard deviation78.66872144
Coefficient of variation (CV)0.4088301242
Kurtosis-0.526737264
Mean192.4239844
Median Absolute Deviation (MAD)56
Skewness-0.3026076446
Sum25127686
Variance6188.767733
MonotonicityNot monotonic
2025-01-08T16:24:07.070801image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
198 1191
 
0.2%
191 1141
 
0.2%
195 1132
 
0.2%
196 1085
 
0.1%
194 1066
 
0.1%
251 1041
 
0.1%
138 996
 
0.1%
137 969
 
0.1%
136 949
 
0.1%
203 935
 
0.1%
Other values (356) 120080
 
16.6%
(Missing) 593923
82.0%
ValueCountFrequency (%)
1 20
 
< 0.1%
2 58
 
< 0.1%
3 24
 
< 0.1%
4 125
< 0.1%
5 150
< 0.1%
ValueCountFrequency (%)
366 8
 
< 0.1%
365 19
< 0.1%
364 23
< 0.1%
363 27
< 0.1%
362 8
 
< 0.1%

year
Real number (ℝ)

Missing 

Distinct190
Distinct (%)0.1%
Missing474684
Missing (%)65.5%
Infinite0
Infinite (%)0.0%
Mean1960.539007
Minimum1805
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:07.139369image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1805
5-th percentile1900
Q11941
median1970
Q31982
95-th percentile1998
Maximum2023
Range218
Interquartile range (IQR)41

Descriptive statistics

Standard deviation30.37698185
Coefficient of variation (CV)0.01549419916
Kurtosis0.1001964871
Mean1960.539007
Median Absolute Deviation (MAD)16
Skewness-0.9196027756
Sum489789697
Variance922.7610263
MonotonicityNot monotonic
2025-01-08T16:24:07.204305image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1980 7356
 
1.0%
1981 7186
 
1.0%
1982 7123
 
1.0%
1976 6483
 
0.9%
1971 6407
 
0.9%
1973 5775
 
0.8%
1984 5606
 
0.8%
1974 5415
 
0.7%
1999 5014
 
0.7%
1987 4898
 
0.7%
Other values (180) 188561
 
26.0%
(Missing) 474684
65.5%
ValueCountFrequency (%)
1805 1
 
< 0.1%
1810 1
 
< 0.1%
1817 1
 
< 0.1%
1823 9
< 0.1%
1824 1
 
< 0.1%
ValueCountFrequency (%)
2023 1
 
< 0.1%
2022 1
 
< 0.1%
2021 1
 
< 0.1%
2020 15
< 0.1%
2019 6
 
< 0.1%

month
Real number (ℝ)

Missing 

Distinct12
Distinct (%)< 0.1%
Missing572740
Missing (%)79.1%
Infinite0
Infinite (%)0.0%
Mean6.718583628
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:07.258089image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q15
median7
Q39
95-th percentile11
Maximum12
Range11
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.66173107
Coefficient of variation (CV)0.3961744346
Kurtosis-0.5834458529
Mean6.718583628
Median Absolute Deviation (MAD)2
Skewness-0.2828535196
Sum1019666
Variance7.084812289
MonotonicityNot monotonic
2025-01-08T16:24:07.307115image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
8 25644
 
3.5%
7 25351
 
3.5%
6 14941
 
2.1%
5 14611
 
2.0%
10 14469
 
2.0%
9 14237
 
2.0%
4 11303
 
1.6%
2 8497
 
1.2%
3 8211
 
1.1%
11 6642
 
0.9%
Other values (2) 7862
 
1.1%
(Missing) 572740
79.1%
ValueCountFrequency (%)
1 4792
 
0.7%
2 8497
1.2%
3 8211
1.1%
4 11303
1.6%
5 14611
2.0%
ValueCountFrequency (%)
12 3070
 
0.4%
11 6642
 
0.9%
10 14469
2.0%
9 14237
2.0%
8 25644
3.5%

day
Real number (ℝ)

Missing 

Distinct31
Distinct (%)< 0.1%
Missing596444
Missing (%)82.3%
Infinite0
Infinite (%)0.0%
Mean15.82372876
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:07.359346image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q19
median16
Q323
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.581155667
Coefficient of variation (CV)0.542296686
Kurtosis-1.116069344
Mean15.82372876
Median Absolute Deviation (MAD)7
Skewness-0.00373108911
Sum2026450
Variance73.63623258
MonotonicityNot monotonic
2025-01-08T16:24:07.416358image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
17 5139
 
0.7%
16 4975
 
0.7%
18 4960
 
0.7%
13 4630
 
0.6%
20 4577
 
0.6%
23 4547
 
0.6%
8 4524
 
0.6%
14 4502
 
0.6%
15 4418
 
0.6%
10 4351
 
0.6%
Other values (21) 81441
 
11.2%
(Missing) 596444
82.3%
ValueCountFrequency (%)
1 3812
0.5%
2 4062
0.6%
3 3807
0.5%
4 3694
0.5%
5 3756
0.5%
ValueCountFrequency (%)
31 2241
0.3%
30 3914
0.5%
29 3746
0.5%
28 4135
0.6%
27 4079
0.6%

verbatimEventDate
Text

Missing 

Distinct17805
Distinct (%)6.4%
Missing445814
Missing (%)61.5%
Memory size5.5 MiB
2025-01-08T16:24:07.593060image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length61
Median length11
Mean length11.41229808
Min length4

Characters and Unicode

Total characters3180539
Distinct characters69
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5871 ?
Unique (%)2.1%

Sample

1st row23 JAN 1985
2nd rowApril, 1928
3rd row-- --- 1980
4th row-- --- 1963
5th row-- --- 1956
ValueCountFrequency (%)
235730
28.9%
aug 23677
 
2.9%
jul 22916
 
2.8%
summer 20031
 
2.5%
jun 14619
 
1.8%
may 14325
 
1.8%
oct 14287
 
1.7%
to 13955
 
1.7%
sep 13176
 
1.6%
apr 10764
 
1.3%
Other values (1210) 433163
53.0%
2025-01-08T16:24:07.847688image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 633590
19.9%
537949
16.9%
1 382844
12.0%
9 314473
9.9%
8 105770
 
3.3%
0 101858
 
3.2%
7 96225
 
3.0%
2 94879
 
3.0%
6 69663
 
2.2%
A 63864
 
2.0%
Other values (59) 779424
24.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1340357
42.1%
Dash Punctuation 633590
19.9%
Space Separator 537949
16.9%
Uppercase Letter 491521
 
15.5%
Lowercase Letter 169648
 
5.3%
Other Punctuation 6422
 
0.2%
Math Symbol 1026
 
< 0.1%
Open Punctuation 13
 
< 0.1%
Close Punctuation 13
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
m 40530
23.9%
u 32141
18.9%
e 26707
15.7%
r 24584
14.5%
t 7049
 
4.2%
a 5225
 
3.1%
l 4565
 
2.7%
g 3709
 
2.2%
n 3604
 
2.1%
p 3590
 
2.1%
Other values (13) 17944
10.6%
Uppercase Letter
ValueCountFrequency (%)
A 63864
13.0%
U 61193
12.4%
J 48266
 
9.8%
O 36480
 
7.4%
S 35414
 
7.2%
T 28143
 
5.7%
N 24509
 
5.0%
P 23974
 
4.9%
E 23721
 
4.8%
G 23661
 
4.8%
Other values (11) 122296
24.9%
Decimal Number
ValueCountFrequency (%)
1 382844
28.6%
9 314473
23.5%
8 105770
 
7.9%
0 101858
 
7.6%
7 96225
 
7.2%
2 94879
 
7.1%
6 69663
 
5.2%
3 60386
 
4.5%
4 58552
 
4.4%
5 55707
 
4.2%
Other Punctuation
ValueCountFrequency (%)
, 3733
58.1%
. 1309
 
20.4%
' 650
 
10.1%
/ 634
 
9.9%
? 92
 
1.4%
; 2
 
< 0.1%
& 1
 
< 0.1%
* 1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
| 1017
99.1%
+ 5
 
0.5%
~ 4
 
0.4%
Dash Punctuation
ValueCountFrequency (%)
- 633590
100.0%
Space Separator
ValueCountFrequency (%)
537949
100.0%
Open Punctuation
ValueCountFrequency (%)
( 13
100.0%
Close Punctuation
ValueCountFrequency (%)
) 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2519370
79.2%
Latin 661169
 
20.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 63864
 
9.7%
U 61193
 
9.3%
J 48266
 
7.3%
m 40530
 
6.1%
O 36480
 
5.5%
S 35414
 
5.4%
u 32141
 
4.9%
T 28143
 
4.3%
e 26707
 
4.0%
r 24584
 
3.7%
Other values (34) 263847
39.9%
Common
ValueCountFrequency (%)
- 633590
25.1%
537949
21.4%
1 382844
15.2%
9 314473
12.5%
8 105770
 
4.2%
0 101858
 
4.0%
7 96225
 
3.8%
2 94879
 
3.8%
6 69663
 
2.8%
3 60386
 
2.4%
Other values (15) 121733
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3180539
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 633590
19.9%
537949
16.9%
1 382844
12.0%
9 314473
9.9%
8 105770
 
3.3%
0 101858
 
3.2%
7 96225
 
3.0%
2 94879
 
3.0%
6 69663
 
2.2%
A 63864
 
2.0%
Other values (59) 779424
24.5%

locationID
Text

Missing 

Distinct66560
Distinct (%)17.1%
Missing335037
Missing (%)46.2%
Memory size5.5 MiB
2025-01-08T16:24:08.049459image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length61
Median length59
Mean length5.757204002
Min length1

Characters and Unicode

Total characters2242264
Distinct characters81
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique40451 ?
Unique (%)10.4%

Sample

1st row1612
2nd row06
3rd rowUSGS LOC M533
4th row42246
5th row707A
ValueCountFrequency (%)
42246 30863
 
6.4%
35k 30551
 
6.3%
loc 19929
 
4.1%
sta 7656
 
1.6%
d 5640
 
1.2%
site 4020
 
0.8%
40193 3269
 
0.7%
leg 3132
 
0.7%
olson 2904
 
0.6%
41142 2897
 
0.6%
Other values (59519) 370823
77.0%
2025-01-08T16:24:08.327207image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 252324
 
11.3%
1 209625
 
9.3%
4 194523
 
8.7%
3 152357
 
6.8%
0 140257
 
6.3%
5 136706
 
6.1%
6 130433
 
5.8%
7 107242
 
4.8%
8 99787
 
4.5%
9 93127
 
4.2%
Other values (71) 725883
32.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1516381
67.6%
Uppercase Letter 531863
 
23.7%
Space Separator 92213
 
4.1%
Dash Punctuation 52032
 
2.3%
Other Punctuation 28932
 
1.3%
Lowercase Letter 15132
 
0.7%
Math Symbol 3062
 
0.1%
Close Punctuation 1336
 
0.1%
Open Punctuation 1313
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
O 51448
 
9.7%
L 50984
 
9.6%
C 46019
 
8.7%
S 44241
 
8.3%
A 41228
 
7.8%
E 37168
 
7.0%
K 36506
 
6.9%
T 30011
 
5.6%
I 25951
 
4.9%
N 20969
 
3.9%
Other values (16) 147338
27.7%
Lowercase Letter
ValueCountFrequency (%)
e 2360
15.6%
a 1816
12.0%
g 1802
11.9%
t 1447
9.6%
o 1201
7.9%
c 1136
7.5%
i 1026
6.8%
s 789
 
5.2%
b 707
 
4.7%
n 562
 
3.7%
Other values (16) 2286
15.1%
Other Punctuation
ValueCountFrequency (%)
. 13863
47.9%
, 10529
36.4%
* 2055
 
7.1%
/ 1776
 
6.1%
' 442
 
1.5%
# 178
 
0.6%
; 41
 
0.1%
? 34
 
0.1%
: 7
 
< 0.1%
" 6
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
2 252324
16.6%
1 209625
13.8%
4 194523
12.8%
3 152357
10.0%
0 140257
9.2%
5 136706
9.0%
6 130433
8.6%
7 107242
7.1%
8 99787
 
6.6%
9 93127
 
6.1%
Math Symbol
ValueCountFrequency (%)
+ 3039
99.2%
= 23
 
0.8%
Close Punctuation
ValueCountFrequency (%)
) 1335
99.9%
] 1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 1304
99.3%
[ 9
 
0.7%
Space Separator
ValueCountFrequency (%)
92213
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 52032
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1695269
75.6%
Latin 546995
 
24.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
O 51448
 
9.4%
L 50984
 
9.3%
C 46019
 
8.4%
S 44241
 
8.1%
A 41228
 
7.5%
E 37168
 
6.8%
K 36506
 
6.7%
T 30011
 
5.5%
I 25951
 
4.7%
N 20969
 
3.8%
Other values (42) 162470
29.7%
Common
ValueCountFrequency (%)
2 252324
14.9%
1 209625
12.4%
4 194523
11.5%
3 152357
9.0%
0 140257
8.3%
5 136706
8.1%
6 130433
7.7%
7 107242
6.3%
8 99787
 
5.9%
9 93127
 
5.5%
Other values (19) 178888
10.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2242264
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 252324
 
11.3%
1 209625
 
9.3%
4 194523
 
8.7%
3 152357
 
6.8%
0 140257
 
6.3%
5 136706
 
6.1%
6 130433
 
5.8%
7 107242
 
4.8%
8 99787
 
4.5%
9 93127
 
4.2%
Other values (71) 725883
32.4%

higherGeography
Text

Missing 

Distinct4708
Distinct (%)0.8%
Missing148417
Missing (%)20.5%
Memory size5.5 MiB
2025-01-08T16:24:08.534713image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length111
Median length97
Mean length42.17362361
Min length4

Characters and Unicode

Total characters24295845
Distinct characters68
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1213 ?
Unique (%)0.2%

Sample

1st rowNorth America, United States, Florida
2nd rowAfrica, Kenya, Marsabit
3rd rowNorth America, United States, Nevada, Pershing County
4th rowCuba, Camaguey Prov
5th rowNorth America, United States, North Carolina, Beaufort County
ValueCountFrequency (%)
north 537307
16.4%
america 480121
14.7%
united 421781
12.9%
states 421705
12.9%
county 259124
 
7.9%
carolina 46843
 
1.4%
canada 38942
 
1.2%
texas 38273
 
1.2%
colorado 35917
 
1.1%
beaufort 33680
 
1.0%
Other values (2951) 959718
29.3%
2025-01-08T16:24:08.800771image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2697320
 
11.1%
t 2343978
 
9.6%
a 2051368
 
8.4%
e 1823223
 
7.5%
i 1571709
 
6.5%
r 1497295
 
6.2%
o 1387848
 
5.7%
, 1279367
 
5.3%
n 1260166
 
5.2%
s 766919
 
3.2%
Other values (58) 7616652
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17040948
70.1%
Uppercase Letter 3272221
 
13.5%
Space Separator 2697320
 
11.1%
Other Punctuation 1284183
 
5.3%
Dash Punctuation 1169
 
< 0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 2343978
13.8%
a 2051368
12.0%
e 1823223
10.7%
i 1571709
9.2%
r 1497295
8.8%
o 1387848
8.1%
n 1260166
7.4%
s 766919
 
4.5%
h 662498
 
3.9%
c 650930
 
3.8%
Other values (24) 3025014
17.8%
Uppercase Letter
ValueCountFrequency (%)
N 590551
18.0%
A 571156
17.5%
C 498307
15.2%
S 484309
14.8%
U 430602
13.2%
B 108340
 
3.3%
M 87750
 
2.7%
O 60025
 
1.8%
T 59534
 
1.8%
P 52139
 
1.6%
Other values (16) 329508
10.1%
Other Punctuation
ValueCountFrequency (%)
, 1279367
99.6%
. 3038
 
0.2%
' 1757
 
0.1%
? 21
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2697320
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1169
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 20313169
83.6%
Common 3982676
 
16.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 2343978
 
11.5%
a 2051368
 
10.1%
e 1823223
 
9.0%
i 1571709
 
7.7%
r 1497295
 
7.4%
o 1387848
 
6.8%
n 1260166
 
6.2%
s 766919
 
3.8%
h 662498
 
3.3%
c 650930
 
3.2%
Other values (50) 6297235
31.0%
Common
ValueCountFrequency (%)
2697320
67.7%
, 1279367
32.1%
. 3038
 
0.1%
' 1757
 
< 0.1%
- 1169
 
< 0.1%
? 21
 
< 0.1%
( 2
 
< 0.1%
) 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24288672
> 99.9%
None 7173
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2697320
 
11.1%
t 2343978
 
9.7%
a 2051368
 
8.4%
e 1823223
 
7.5%
i 1571709
 
6.5%
r 1497295
 
6.2%
o 1387848
 
5.7%
, 1279367
 
5.3%
n 1260166
 
5.2%
s 766919
 
3.2%
Other values (50) 7609479
31.3%
None
ValueCountFrequency (%)
ó 3473
48.4%
í 2116
29.5%
á 1037
 
14.5%
é 539
 
7.5%
ñ 4
 
0.1%
è 2
 
< 0.1%
ä 1
 
< 0.1%
ú 1
 
< 0.1%

continent
Text

Missing 

Distinct7
Distinct (%)< 0.1%
Missing195168
Missing (%)26.9%
Memory size5.5 MiB
2025-01-08T16:24:08.865873image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length13
Mean length12.51518684
Min length4

Characters and Unicode

Total characters6624789
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNORTH_AMERICA
2nd rowAFRICA
3rd rowNORTH_AMERICA
4th rowNORTH_AMERICA
5th rowNORTH_AMERICA
ValueCountFrequency (%)
north_america 480938
90.9%
south_america 11223
 
2.1%
europe 9975
 
1.9%
asia 9042
 
1.7%
oceania 8130
 
1.5%
africa 6638
 
1.3%
antarctica 3394
 
0.6%
2025-01-08T16:24:08.972543image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 1042124
15.7%
R 993106
15.0%
E 520241
7.9%
I 519365
7.8%
C 513717
7.8%
O 510266
7.7%
T 498949
7.5%
N 492462
7.4%
H 492161
7.4%
_ 492161
7.4%
Other values (5) 550237
8.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 6132628
92.6%
Connector Punctuation 492161
 
7.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 1042124
17.0%
R 993106
16.2%
E 520241
8.5%
I 519365
8.5%
C 513717
8.4%
O 510266
8.3%
T 498949
8.1%
N 492462
8.0%
H 492161
8.0%
M 492161
8.0%
Other values (4) 58076
 
0.9%
Connector Punctuation
ValueCountFrequency (%)
_ 492161
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6132628
92.6%
Common 492161
 
7.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 1042124
17.0%
R 993106
16.2%
E 520241
8.5%
I 519365
8.5%
C 513717
8.4%
O 510266
8.3%
T 498949
8.1%
N 492462
8.0%
H 492161
8.0%
M 492161
8.0%
Other values (4) 58076
 
0.9%
Common
ValueCountFrequency (%)
_ 492161
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6624789
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 1042124
15.7%
R 993106
15.0%
E 520241
7.9%
I 519365
7.8%
C 513717
7.8%
O 510266
7.7%
T 498949
7.5%
N 492462
7.4%
H 492161
7.4%
_ 492161
7.4%
Other values (5) 550237
8.3%

waterBody
Text

Missing 

Distinct172
Distinct (%)0.6%
Missing696851
Missing (%)96.2%
Memory size5.5 MiB
2025-01-08T16:24:09.077252image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length61
Median length54
Mean length21.95758759
Min length8

Characters and Unicode

Total characters607281
Distinct characters49
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique58 ?
Unique (%)0.2%

Sample

1st rowNorth Atlantic Ocean
2nd rowNorth Pacific Ocean
3rd rowNorth Atlantic Ocean, Caribbean Sea
4th rowNorth Atlantic Ocean
5th rowNorth Atlantic Ocean
ValueCountFrequency (%)
ocean 26667
28.1%
north 18835
19.9%
atlantic 13621
14.4%
pacific 8356
 
8.8%
sea 5778
 
6.1%
indian 4034
 
4.3%
south 2993
 
3.2%
timor 2479
 
2.6%
of 2181
 
2.3%
gulf 2067
 
2.2%
Other values (146) 7758
 
8.2%
2025-01-08T16:24:09.249623image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
67112
11.1%
a 66029
10.9%
c 60399
9.9%
n 52729
 
8.7%
t 51240
 
8.4%
i 42959
 
7.1%
e 39252
 
6.5%
o 28732
 
4.7%
O 27050
 
4.5%
r 26329
 
4.3%
Other values (39) 145450
24.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 439588
72.4%
Uppercase Letter 92948
 
15.3%
Space Separator 67112
 
11.1%
Other Punctuation 7633
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 66029
15.0%
c 60399
13.7%
n 52729
12.0%
t 51240
11.7%
i 42959
9.8%
e 39252
8.9%
o 28732
6.5%
r 26329
 
6.0%
h 22202
 
5.1%
l 16619
 
3.8%
Other values (15) 33098
7.5%
Uppercase Letter
ValueCountFrequency (%)
O 27050
29.1%
N 18947
20.4%
A 14632
15.7%
S 9530
 
10.3%
P 8558
 
9.2%
I 4100
 
4.4%
M 2579
 
2.8%
T 2567
 
2.8%
G 2317
 
2.5%
C 1788
 
1.9%
Other values (12) 880
 
0.9%
Space Separator
ValueCountFrequency (%)
67112
100.0%
Other Punctuation
ValueCountFrequency (%)
, 7633
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 532536
87.7%
Common 74745
 
12.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 66029
12.4%
c 60399
11.3%
n 52729
9.9%
t 51240
9.6%
i 42959
 
8.1%
e 39252
 
7.4%
o 28732
 
5.4%
O 27050
 
5.1%
r 26329
 
4.9%
h 22202
 
4.2%
Other values (37) 115615
21.7%
Common
ValueCountFrequency (%)
67112
89.8%
, 7633
 
10.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 607281
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
67112
11.1%
a 66029
10.9%
c 60399
9.9%
n 52729
 
8.7%
t 51240
 
8.4%
i 42959
 
7.1%
e 39252
 
6.5%
o 28732
 
4.7%
O 27050
 
4.5%
r 26329
 
4.3%
Other values (39) 145450
24.0%

islandGroup
Text

Missing 

Distinct33
Distinct (%)4.1%
Missing723710
Missing (%)99.9%
Memory size5.5 MiB
2025-01-08T16:24:09.319930image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length25
Median length24
Mean length16.78571429
Min length5

Characters and Unicode

Total characters13395
Distinct characters46
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)1.6%

Sample

1st rowMariana Islands
2nd rowNorthern Mariana Islands
3rd rowGilbert Islands
4th rowGilbert Islands
5th rowAleutian Islands
ValueCountFrequency (%)
islands 765
44.5%
marshall 241
 
14.0%
mariana 155
 
9.0%
gilbert 135
 
7.9%
northern 134
 
7.8%
marianas 120
 
7.0%
solomon 21
 
1.2%
ryukyu 18
 
1.0%
hawaiian 18
 
1.0%
antilles 15
 
0.9%
Other values (26) 97
 
5.6%
2025-01-08T16:24:09.456439image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 2202
16.4%
s 1936
14.5%
l 1461
10.9%
n 1270
9.5%
r 960
7.2%
921
6.9%
d 800
 
6.0%
I 765
 
5.7%
M 527
 
3.9%
i 498
 
3.7%
Other values (36) 2055
15.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10752
80.3%
Uppercase Letter 1720
 
12.8%
Space Separator 921
 
6.9%
Other Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2202
20.5%
s 1936
18.0%
l 1461
13.6%
n 1270
11.8%
r 960
8.9%
d 800
 
7.4%
i 498
 
4.6%
h 376
 
3.5%
e 374
 
3.5%
t 298
 
2.8%
Other values (13) 577
 
5.4%
Uppercase Letter
ValueCountFrequency (%)
I 765
44.5%
M 527
30.6%
N 140
 
8.1%
G 135
 
7.8%
A 25
 
1.5%
L 24
 
1.4%
S 24
 
1.4%
H 18
 
1.0%
R 18
 
1.0%
C 11
 
0.6%
Other values (11) 33
 
1.9%
Space Separator
ValueCountFrequency (%)
921
100.0%
Other Punctuation
ValueCountFrequency (%)
. 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12472
93.1%
Common 923
 
6.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2202
17.7%
s 1936
15.5%
l 1461
11.7%
n 1270
10.2%
r 960
7.7%
d 800
 
6.4%
I 765
 
6.1%
M 527
 
4.2%
i 498
 
4.0%
h 376
 
3.0%
Other values (34) 1677
13.4%
Common
ValueCountFrequency (%)
921
99.8%
. 2
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13395
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 2202
16.4%
s 1936
14.5%
l 1461
10.9%
n 1270
9.5%
r 960
7.2%
921
6.9%
d 800
 
6.0%
I 765
 
5.7%
M 527
 
3.9%
i 498
 
3.7%
Other values (36) 2055
15.3%

island
Text

Missing 

Distinct87
Distinct (%)0.9%
Missing714401
Missing (%)98.6%
Memory size5.5 MiB
2025-01-08T16:24:09.534194image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length4
Mean length6.015335906
Min length3

Characters and Unicode

Total characters60797
Distinct characters50
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)0.4%

Sample

1st rowOahu
2nd rowOahu
3rd rowOahu
4th rowAnimasola Island
5th rowMolokai
ValueCountFrequency (%)
oahu 5926
51.1%
molokai 2218
 
19.1%
saint 944
 
8.1%
helena 938
 
8.1%
atoll 241
 
2.1%
saipan 132
 
1.1%
guam 129
 
1.1%
onotoa 116
 
1.0%
martha's 108
 
0.9%
vineyard 108
 
0.9%
Other values (91) 728
 
6.3%
2025-01-08T16:24:09.678917image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 11360
18.7%
u 6232
10.3%
h 6099
10.0%
O 6043
9.9%
o 5165
8.5%
i 4062
 
6.7%
l 3813
 
6.3%
n 2689
 
4.4%
k 2476
 
4.1%
M 2342
 
3.9%
Other values (40) 10516
17.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 47612
78.3%
Uppercase Letter 11591
 
19.1%
Space Separator 1481
 
2.4%
Other Punctuation 109
 
0.2%
Dash Punctuation 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 11360
23.9%
u 6232
13.1%
h 6099
12.8%
o 5165
10.8%
i 4062
 
8.5%
l 3813
 
8.0%
n 2689
 
5.6%
k 2476
 
5.2%
e 2309
 
4.8%
t 1709
 
3.6%
Other values (16) 1698
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
O 6043
52.1%
M 2342
 
20.2%
S 1177
 
10.2%
H 941
 
8.1%
A 273
 
2.4%
G 140
 
1.2%
B 138
 
1.2%
E 125
 
1.1%
V 121
 
1.0%
I 89
 
0.8%
Other values (11) 202
 
1.7%
Space Separator
ValueCountFrequency (%)
1481
100.0%
Other Punctuation
ValueCountFrequency (%)
' 109
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 59203
97.4%
Common 1594
 
2.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 11360
19.2%
u 6232
10.5%
h 6099
10.3%
O 6043
10.2%
o 5165
8.7%
i 4062
 
6.9%
l 3813
 
6.4%
n 2689
 
4.5%
k 2476
 
4.2%
M 2342
 
4.0%
Other values (37) 8922
15.1%
Common
ValueCountFrequency (%)
1481
92.9%
' 109
 
6.8%
- 4
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60794
> 99.9%
None 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 11360
18.7%
u 6232
10.3%
h 6099
10.0%
O 6043
9.9%
o 5165
8.5%
i 4062
 
6.7%
l 3813
 
6.3%
n 2689
 
4.4%
k 2476
 
4.1%
M 2342
 
3.9%
Other values (38) 10513
17.3%
None
ValueCountFrequency (%)
ñ 2
66.7%
é 1
33.3%

countryCode
Text

Missing 

Distinct185
Distinct (%)< 0.1%
Missing158422
Missing (%)21.9%
Memory size5.5 MiB
2025-01-08T16:24:09.825314image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1132172
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)< 0.1%

Sample

1st rowUS
2nd rowKE
3rd rowUS
4th rowCU
5th rowUS
ValueCountFrequency (%)
us 428942
75.8%
ca 39076
 
6.9%
pa 8629
 
1.5%
do 6290
 
1.1%
mx 3952
 
0.7%
co 3623
 
0.6%
fr 3541
 
0.6%
aq 3460
 
0.6%
cr 3282
 
0.6%
pr 3114
 
0.6%
Other values (175) 62177
 
11.0%
2025-01-08T16:24:10.017598image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
U 434999
38.4%
S 434979
38.4%
A 57869
 
5.1%
C 53653
 
4.7%
P 19200
 
1.7%
E 14200
 
1.3%
R 12973
 
1.1%
O 11631
 
1.0%
D 10040
 
0.9%
M 9508
 
0.8%
Other values (16) 73120
 
6.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1132172
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U 434999
38.4%
S 434979
38.4%
A 57869
 
5.1%
C 53653
 
4.7%
P 19200
 
1.7%
E 14200
 
1.3%
R 12973
 
1.1%
O 11631
 
1.0%
D 10040
 
0.9%
M 9508
 
0.8%
Other values (16) 73120
 
6.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 1132172
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
U 434999
38.4%
S 434979
38.4%
A 57869
 
5.1%
C 53653
 
4.7%
P 19200
 
1.7%
E 14200
 
1.3%
R 12973
 
1.1%
O 11631
 
1.0%
D 10040
 
0.9%
M 9508
 
0.8%
Other values (16) 73120
 
6.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1132172
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U 434999
38.4%
S 434979
38.4%
A 57869
 
5.1%
C 53653
 
4.7%
P 19200
 
1.7%
E 14200
 
1.3%
R 12973
 
1.1%
O 11631
 
1.0%
D 10040
 
0.9%
M 9508
 
0.8%
Other values (16) 73120
 
6.5%

stateProvince
Text

Missing 

Distinct892
Distinct (%)0.2%
Missing226462
Missing (%)31.3%
Memory size5.5 MiB
2025-01-08T16:24:10.207871image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length25
Median length23
Mean length8.789222281
Min length3

Characters and Unicode

Total characters4377437
Distinct characters64
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique236 ?
Unique (%)< 0.1%

Sample

1st rowFlorida
2nd rowMarsabit
3rd rowNevada
4th rowCamaguey Prov
5th rowNorth Carolina
ValueCountFrequency (%)
carolina 46813
 
7.5%
north 45129
 
7.2%
texas 38253
 
6.1%
colorado 35917
 
5.8%
california 32474
 
5.2%
columbia 32203
 
5.2%
british 32085
 
5.1%
alaska 28545
 
4.6%
new 23155
 
3.7%
wyoming 22778
 
3.6%
Other values (878) 287106
46.0%
2025-01-08T16:24:10.491470image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 622536
14.2%
i 445132
 
10.2%
o 412678
 
9.4%
r 299951
 
6.9%
n 262321
 
6.0%
l 249350
 
5.7%
s 213346
 
4.9%
e 190372
 
4.3%
C 155417
 
3.6%
t 143584
 
3.3%
Other values (54) 1382750
31.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3624857
82.8%
Uppercase Letter 625183
 
14.3%
Space Separator 126412
 
2.9%
Dash Punctuation 508
 
< 0.1%
Other Punctuation 475
 
< 0.1%
Open Punctuation 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 622536
17.2%
i 445132
12.3%
o 412678
11.4%
r 299951
8.3%
n 262321
 
7.2%
l 249350
 
6.9%
s 213346
 
5.9%
e 190372
 
5.3%
t 143584
 
4.0%
h 114639
 
3.2%
Other values (22) 670948
18.5%
Uppercase Letter
ValueCountFrequency (%)
C 155417
24.9%
N 87902
14.1%
M 48444
 
7.7%
T 47635
 
7.6%
A 45155
 
7.2%
B 36744
 
5.9%
W 32086
 
5.1%
H 20814
 
3.3%
O 19325
 
3.1%
I 17859
 
2.9%
Other values (16) 113802
18.2%
Other Punctuation
ValueCountFrequency (%)
. 425
89.5%
' 50
 
10.5%
Space Separator
ValueCountFrequency (%)
126412
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 508
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4250040
97.1%
Common 127397
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 622536
14.6%
i 445132
 
10.5%
o 412678
 
9.7%
r 299951
 
7.1%
n 262321
 
6.2%
l 249350
 
5.9%
s 213346
 
5.0%
e 190372
 
4.5%
C 155417
 
3.7%
t 143584
 
3.4%
Other values (48) 1255353
29.5%
Common
ValueCountFrequency (%)
126412
99.2%
- 508
 
0.4%
. 425
 
0.3%
' 50
 
< 0.1%
( 1
 
< 0.1%
) 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4371514
99.9%
None 5923
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 622536
14.2%
i 445132
 
10.2%
o 412678
 
9.4%
r 299951
 
6.9%
n 262321
 
6.0%
l 249350
 
5.7%
s 213346
 
4.9%
e 190372
 
4.4%
C 155417
 
3.6%
t 143584
 
3.3%
Other values (48) 1376827
31.5%
None
ValueCountFrequency (%)
ó 2622
44.3%
í 1945
32.8%
á 1034
 
17.5%
é 319
 
5.4%
è 2
 
< 0.1%
ñ 1
 
< 0.1%

county
Text

Missing 

Distinct1997
Distinct (%)0.7%
Missing454433
Missing (%)62.7%
Memory size5.5 MiB
2025-01-08T16:24:10.688530image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length34
Median length29
Mean length14.2528779
Min length3

Characters and Unicode

Total characters3849346
Distinct characters65
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique393 ?
Unique (%)0.1%

Sample

1st rowPershing County
2nd rowBeaufort County
3rd rowBrewster County
4th rowLos Angeles County
5th rowHonolulu County
ValueCountFrequency (%)
county 259124
45.6%
beaufort 33592
 
5.9%
brewster 15677
 
2.8%
maui 10401
 
1.8%
los 8883
 
1.6%
angeles 8865
 
1.6%
honolulu 5926
 
1.0%
san 4953
 
0.9%
lincoln 4346
 
0.8%
culberson 4132
 
0.7%
Other values (1945) 212334
37.4%
2025-01-08T16:24:10.949336image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 423340
11.0%
n 401510
10.4%
t 375302
9.7%
u 352655
9.2%
298158
 
7.7%
C 289740
 
7.5%
y 279783
 
7.3%
e 215178
 
5.6%
a 186491
 
4.8%
r 177010
 
4.6%
Other values (55) 850179
22.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2976107
77.3%
Uppercase Letter 570194
 
14.8%
Space Separator 298158
 
7.7%
Other Punctuation 4230
 
0.1%
Dash Punctuation 657
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 423340
14.2%
n 401510
13.5%
t 375302
12.6%
u 352655
11.8%
y 279783
9.4%
e 215178
7.2%
a 186491
6.3%
r 177010
5.9%
l 100058
 
3.4%
s 96459
 
3.2%
Other values (23) 368321
12.4%
Uppercase Letter
ValueCountFrequency (%)
C 289740
50.8%
B 65415
 
11.5%
M 27388
 
4.8%
S 25040
 
4.4%
L 22655
 
4.0%
P 16991
 
3.0%
A 16627
 
2.9%
H 14879
 
2.6%
D 12691
 
2.2%
W 9829
 
1.7%
Other values (16) 68939
 
12.1%
Other Punctuation
ValueCountFrequency (%)
. 2609
61.7%
' 1598
37.8%
? 21
 
0.5%
, 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
298158
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 657
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3546301
92.1%
Common 303045
 
7.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 423340
11.9%
n 401510
11.3%
t 375302
10.6%
u 352655
9.9%
C 289740
 
8.2%
y 279783
 
7.9%
e 215178
 
6.1%
a 186491
 
5.3%
r 177010
 
5.0%
l 100058
 
2.8%
Other values (49) 745234
21.0%
Common
ValueCountFrequency (%)
298158
98.4%
. 2609
 
0.9%
' 1598
 
0.5%
- 657
 
0.2%
? 21
 
< 0.1%
, 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3848100
> 99.9%
None 1246
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 423340
11.0%
n 401510
10.4%
t 375302
9.8%
u 352655
9.2%
298158
 
7.7%
C 289740
 
7.5%
y 279783
 
7.3%
e 215178
 
5.6%
a 186491
 
4.8%
r 177010
 
4.6%
Other values (48) 848933
22.1%
None
ValueCountFrequency (%)
ó 851
68.3%
é 218
 
17.5%
í 171
 
13.7%
á 3
 
0.2%
ä 1
 
0.1%
ñ 1
 
0.1%
ú 1
 
0.1%

locality
Text

Missing 

Distinct31755
Distinct (%)19.4%
Missing560871
Missing (%)77.4%
Memory size5.5 MiB
2025-01-08T16:24:11.151314image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length471
Median length316
Mean length59.79365302
Min length1

Characters and Unicode

Total characters9784454
Distinct characters100
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21088 ?
Unique (%)12.9%

Sample

1st rowSt. Andrew Bay
2nd rowNuevitas Bay, Between Nuevitas And Pastelillo
3rd rowPalos Verdes Hills; East side of Deadman's Island
4th rowNorth slope of San Pedro Hills, ravine S of harbor City, 4200 feet N and 53.5 degrees E from 342-foot hill, 100 feet up ravine from end of Bellepoint Street (W98-30)
5th rowCoyote Springs Valley; spring
ValueCountFrequency (%)
of 120156
 
7.0%
34919
 
2.0%
and 22265
 
1.3%
bay 19665
 
1.1%
the 18421
 
1.1%
on 17778
 
1.0%
from 16823
 
1.0%
n 16777
 
1.0%
feet 15757
 
0.9%
river 15334
 
0.9%
Other values (34131) 1421831
82.7%
2025-01-08T16:24:11.423179image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1556089
 
15.9%
e 696401
 
7.1%
a 667613
 
6.8%
o 563197
 
5.8%
n 459256
 
4.7%
t 454549
 
4.6%
r 411335
 
4.2%
i 400968
 
4.1%
l 325764
 
3.3%
s 321160
 
3.3%
Other values (90) 3928122
40.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5945227
60.8%
Space Separator 1556089
 
15.9%
Uppercase Letter 1177808
 
12.0%
Decimal Number 550644
 
5.6%
Other Punctuation 394583
 
4.0%
Dash Punctuation 53241
 
0.5%
Open Punctuation 40436
 
0.4%
Close Punctuation 40130
 
0.4%
Math Symbol 26252
 
0.3%
Connector Punctuation 35
 
< 0.1%
Other values (2) 9
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 696401
11.7%
a 667613
11.2%
o 563197
 
9.5%
n 459256
 
7.7%
t 454549
 
7.6%
r 411335
 
6.9%
i 400968
 
6.7%
l 325764
 
5.5%
s 321160
 
5.4%
f 214183
 
3.6%
Other values (21) 1430801
24.1%
Uppercase Letter
ValueCountFrequency (%)
S 174300
14.8%
C 112455
 
9.5%
O 84488
 
7.2%
N 76065
 
6.5%
B 74827
 
6.4%
R 70201
 
6.0%
P 66728
 
5.7%
A 62185
 
5.3%
W 51082
 
4.3%
T 49504
 
4.2%
Other values (17) 355973
30.2%
Other Punctuation
ValueCountFrequency (%)
, 179506
45.5%
. 103955
26.3%
; 73054
18.5%
/ 19087
 
4.8%
' 7147
 
1.8%
: 4428
 
1.1%
# 4037
 
1.0%
" 1994
 
0.5%
? 703
 
0.2%
& 599
 
0.2%
Other values (5) 73
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 125210
22.7%
0 82093
14.9%
2 69469
12.6%
5 50957
9.3%
3 50931
9.2%
4 49415
 
9.0%
6 36615
 
6.6%
7 31244
 
5.7%
8 27594
 
5.0%
9 27116
 
4.9%
Math Symbol
ValueCountFrequency (%)
| 22235
84.7%
+ 2928
 
11.2%
= 1045
 
4.0%
± 36
 
0.1%
~ 8
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 37729
93.3%
{ 2081
 
5.1%
[ 626
 
1.5%
Close Punctuation
ValueCountFrequency (%)
) 37422
93.3%
} 2082
 
5.2%
] 626
 
1.6%
Currency Symbol
ValueCountFrequency (%)
$ 3
60.0%
2
40.0%
Space Separator
ValueCountFrequency (%)
1556089
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 53241
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 35
100.0%
Other Symbol
ValueCountFrequency (%)
° 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7123035
72.8%
Common 2661419
 
27.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 696401
 
9.8%
a 667613
 
9.4%
o 563197
 
7.9%
n 459256
 
6.4%
t 454549
 
6.4%
r 411335
 
5.8%
i 400968
 
5.6%
l 325764
 
4.6%
s 321160
 
4.5%
f 214183
 
3.0%
Other values (48) 2608609
36.6%
Common
ValueCountFrequency (%)
1556089
58.5%
, 179506
 
6.7%
1 125210
 
4.7%
. 103955
 
3.9%
0 82093
 
3.1%
; 73054
 
2.7%
2 69469
 
2.6%
- 53241
 
2.0%
5 50957
 
1.9%
3 50931
 
1.9%
Other values (32) 316914
 
11.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9784239
> 99.9%
None 213
 
< 0.1%
Currency Symbols 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1556089
 
15.9%
e 696401
 
7.1%
a 667613
 
6.8%
o 563197
 
5.8%
n 459256
 
4.7%
t 454549
 
4.6%
r 411335
 
4.2%
i 400968
 
4.1%
l 325764
 
3.3%
s 321160
 
3.3%
Other values (81) 3927907
40.1%
None
ValueCountFrequency (%)
ñ 93
43.7%
± 36
 
16.9%
à 36
 
16.9%
í 27
 
12.7%
á 14
 
6.6%
° 4
 
1.9%
é 2
 
0.9%
ö 1
 
0.5%
Currency Symbols
ValueCountFrequency (%)
2
100.0%

verbatimElevation
Text

Missing 

Distinct7
Distinct (%)3.6%
Missing724311
Missing (%)> 99.9%
Memory size5.5 MiB
2025-01-08T16:24:11.523131image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length88
Median length88
Mean length81.14720812
Min length8

Characters and Unicode

Total characters15986
Distinct characters55
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)1.0%

Sample

1st rowElevation for Rampart Cave derived from Google Earth by Dr. Jim Mead on 4 Decemeber 2023
2nd rowApprox.450-500ft Above Base Of Fm
3rd rowElevation for Rampart Cave derived from Google Earth by Dr. Jim Mead on 4 Decemeber 2023
4th rowElevation for Rampart Cave derived from Google Earth by Dr. Jim Mead on 4 Decemeber 2023
5th rowElevation for Rampart Cave derived from Google Earth by Dr. Jim Mead on 4 Decemeber 2023
ValueCountFrequency (%)
elevation 161
 
5.5%
by 161
 
5.5%
2023 161
 
5.5%
decemeber 161
 
5.5%
4 161
 
5.5%
mead 161
 
5.5%
jim 161
 
5.5%
dr 161
 
5.5%
on 161
 
5.5%
earth 161
 
5.5%
Other values (38) 1300
44.7%
2025-01-08T16:24:11.678333image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2713
17.0%
e 1696
 
10.6%
r 1185
 
7.4%
o 1092
 
6.8%
a 1023
 
6.4%
m 656
 
4.1%
t 562
 
3.5%
v 533
 
3.3%
i 527
 
3.3%
d 497
 
3.1%
Other values (45) 5502
34.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10285
64.3%
Space Separator 2713
 
17.0%
Uppercase Letter 1740
 
10.9%
Decimal Number 968
 
6.1%
Other Punctuation 239
 
1.5%
Math Symbol 29
 
0.2%
Dash Punctuation 12
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1696
16.5%
r 1185
11.5%
o 1092
10.6%
a 1023
9.9%
m 656
 
6.4%
t 562
 
5.5%
v 533
 
5.2%
i 527
 
5.1%
d 497
 
4.8%
n 407
 
4.0%
Other values (13) 2107
20.5%
Uppercase Letter
ValueCountFrequency (%)
D 322
18.5%
E 322
18.5%
C 194
11.1%
M 185
10.6%
J 161
9.3%
G 161
9.3%
R 161
9.3%
A 64
 
3.7%
B 53
 
3.0%
O 25
 
1.4%
Other values (8) 92
 
5.3%
Decimal Number
ValueCountFrequency (%)
2 354
36.6%
0 209
21.6%
4 173
17.9%
3 161
16.6%
5 40
 
4.1%
1 25
 
2.6%
6 5
 
0.5%
8 1
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 196
82.0%
, 42
 
17.6%
/ 1
 
0.4%
Space Separator
ValueCountFrequency (%)
2713
100.0%
Math Symbol
ValueCountFrequency (%)
+ 29
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12025
75.2%
Common 3961
 
24.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1696
14.1%
r 1185
 
9.9%
o 1092
 
9.1%
a 1023
 
8.5%
m 656
 
5.5%
t 562
 
4.7%
v 533
 
4.4%
i 527
 
4.4%
d 497
 
4.1%
n 407
 
3.4%
Other values (31) 3847
32.0%
Common
ValueCountFrequency (%)
2713
68.5%
2 354
 
8.9%
0 209
 
5.3%
. 196
 
4.9%
4 173
 
4.4%
3 161
 
4.1%
, 42
 
1.1%
5 40
 
1.0%
+ 29
 
0.7%
1 25
 
0.6%
Other values (4) 19
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15986
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2713
17.0%
e 1696
 
10.6%
r 1185
 
7.4%
o 1092
 
6.8%
a 1023
 
6.4%
m 656
 
4.1%
t 562
 
3.5%
v 533
 
3.3%
i 527
 
3.3%
d 497
 
3.1%
Other values (45) 5502
34.4%

verbatimDepth
Text

Missing 

Distinct17
Distinct (%)20.2%
Missing724424
Missing (%)> 99.9%
Memory size5.5 MiB
2025-01-08T16:24:11.741142image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length14
Median length10
Mean length5.523809524
Min length4

Characters and Unicode

Total characters464
Distinct characters40
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)10.7%

Sample

1st rowreef
2nd rowBeach
3rd row?48 Ms
4th rowBeach
5th rowIntertidal
ValueCountFrequency (%)
reef 30
27.5%
beach 25
22.9%
low 9
 
8.3%
ms 8
 
7.3%
water 7
 
6.4%
48 6
 
5.5%
no.4 4
 
3.7%
mnb 3
 
2.8%
57ms 2
 
1.8%
25 2
 
1.8%
Other values (12) 13
11.9%
2025-01-08T16:24:11.857666image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 96
20.7%
r 40
 
8.6%
a 37
 
8.0%
f 31
 
6.7%
c 26
 
5.6%
h 25
 
5.4%
25
 
5.4%
b 18
 
3.9%
o 13
 
2.8%
t 13
 
2.8%
Other values (30) 140
30.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 339
73.1%
Uppercase Letter 51
 
11.0%
Decimal Number 32
 
6.9%
Space Separator 25
 
5.4%
Other Punctuation 16
 
3.4%
Dash Punctuation 1
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 96
28.3%
r 40
11.8%
a 37
 
10.9%
f 31
 
9.1%
c 26
 
7.7%
h 25
 
7.4%
b 18
 
5.3%
o 13
 
3.8%
t 13
 
3.8%
s 10
 
2.9%
Other values (7) 30
 
8.8%
Uppercase Letter
ValueCountFrequency (%)
M 12
23.5%
B 10
19.6%
L 9
17.6%
W 8
15.7%
N 4
 
7.8%
F 2
 
3.9%
A 1
 
2.0%
S 1
 
2.0%
U 1
 
2.0%
C 1
 
2.0%
Other values (2) 2
 
3.9%
Decimal Number
ValueCountFrequency (%)
4 11
34.4%
8 8
25.0%
5 4
 
12.5%
7 3
 
9.4%
0 3
 
9.4%
2 2
 
6.2%
3 1
 
3.1%
Other Punctuation
ValueCountFrequency (%)
. 10
62.5%
? 6
37.5%
Space Separator
ValueCountFrequency (%)
25
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 390
84.1%
Common 74
 
15.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 96
24.6%
r 40
10.3%
a 37
 
9.5%
f 31
 
7.9%
c 26
 
6.7%
h 25
 
6.4%
b 18
 
4.6%
o 13
 
3.3%
t 13
 
3.3%
M 12
 
3.1%
Other values (19) 79
20.3%
Common
ValueCountFrequency (%)
25
33.8%
4 11
14.9%
. 10
 
13.5%
8 8
 
10.8%
? 6
 
8.1%
5 4
 
5.4%
7 3
 
4.1%
0 3
 
4.1%
2 2
 
2.7%
- 1
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 464
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 96
20.7%
r 40
 
8.6%
a 37
 
8.0%
f 31
 
6.7%
c 26
 
5.6%
h 25
 
5.4%
25
 
5.4%
b 18
 
3.9%
o 13
 
2.8%
t 13
 
2.8%
Other values (30) 140
30.2%

decimalLatitude
Real number (ℝ)

Missing 

Distinct34309
Distinct (%)33.0%
Missing620570
Missing (%)85.7%
Infinite0
Infinite (%)0.0%
Mean36.17761578
Minimum-77.9033
Maximum89.13
Zeros12
Zeros (%)< 0.1%
Negative5725
Negative (%)0.8%
Memory size5.5 MiB
2025-01-08T16:24:11.926122image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum-77.9033
5-th percentile-9.0417
Q130.2267
median37.54725
Q345.743025
95-th percentile59.895255
Maximum89.13
Range167.0333
Interquartile range (IQR)15.516325

Descriptive statistics

Standard deviation18.98229075
Coefficient of variation (CV)0.5246971185
Kurtosis4.688030722
Mean36.17761578
Median Absolute Deviation (MAD)7.40415
Skewness-1.613703618
Sum3760229.028
Variance360.3273622
MonotonicityNot monotonic
2025-01-08T16:24:12.098277image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
44.6458 1686
 
0.2%
17.5 673
 
0.1%
29.8119 329
 
< 0.1%
33.1767 323
 
< 0.1%
34.6405 307
 
< 0.1%
38.8295 287
 
< 0.1%
41.1458 279
 
< 0.1%
48.1104 243
 
< 0.1%
40.6184 235
 
< 0.1%
31.6767 227
 
< 0.1%
Other values (34299) 99349
 
13.7%
(Missing) 620570
85.7%
ValueCountFrequency (%)
-77.9033 5
 
< 0.1%
-77.58 1
 
< 0.1%
-77.57 5
 
< 0.1%
-77.5 15
< 0.1%
-76.98 1
 
< 0.1%
ValueCountFrequency (%)
89.13 3
 
< 0.1%
88.7817 9
< 0.1%
88.515 7
< 0.1%
88.0367 7
< 0.1%
87.75 7
< 0.1%

decimalLongitude
Real number (ℝ)

Missing 

Distinct35343
Distinct (%)34.0%
Missing620570
Missing (%)85.7%
Infinite0
Infinite (%)0.0%
Mean-84.45552615
Minimum-179.57
Maximum179.8
Zeros19
Zeros (%)< 0.1%
Negative95623
Negative (%)13.2%
Memory size5.5 MiB
2025-01-08T16:24:12.160242image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum-179.57
5-th percentile-156.775
Q1-122.706
median-87.3072
Q3-75.610425
95-th percentile88.7181
Maximum179.8
Range359.37
Interquartile range (IQR)47.095575

Descriptive statistics

Standard deviation63.087641
Coefficient of variation (CV)-0.7469924571
Kurtosis5.28951012
Mean-84.45552615
Median Absolute Deviation (MAD)17.5088
Skewness2.138850984
Sum-8778138.477
Variance3980.050447
MonotonicityNot monotonic
2025-01-08T16:24:12.225779image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-123.908 1686
 
0.2%
-95.0833 673
 
0.1%
-103.252 329
 
< 0.1%
-98.6878 321
 
< 0.1%
-105.851 307
 
< 0.1%
-76.8473 287
 
< 0.1%
-115.358 279
 
< 0.1%
-123.934 243
 
< 0.1%
-108.207 235
 
< 0.1%
-123.18 230
 
< 0.1%
Other values (35333) 99348
 
13.7%
(Missing) 620570
85.7%
ValueCountFrequency (%)
-179.57 1
 
< 0.1%
-179.556 12
< 0.1%
-179.555 4
 
< 0.1%
-179.55 4
 
< 0.1%
-179 1
 
< 0.1%
ValueCountFrequency (%)
179.8 1
< 0.1%
179.58 1
< 0.1%
179.5 1
< 0.1%
179.137 2
< 0.1%
179.08 2
< 0.1%

verbatimCoordinateSystem
Text

Constant  Missing 

Distinct1
Distinct (%)< 0.1%
Missing654265
Missing (%)90.3%
Memory size5.5 MiB
2025-01-08T16:24:12.272074image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length23
Median length23
Mean length23
Min length23

Characters and Unicode

Total characters1615589
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDegrees Minutes Seconds
2nd rowDegrees Minutes Seconds
3rd rowDegrees Minutes Seconds
4th rowDegrees Minutes Seconds
5th rowDegrees Minutes Seconds
ValueCountFrequency (%)
degrees 70243
33.3%
minutes 70243
33.3%
seconds 70243
33.3%
2025-01-08T16:24:12.377453image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 351215
21.7%
s 210729
13.0%
140486
 
8.7%
n 140486
 
8.7%
D 70243
 
4.3%
g 70243
 
4.3%
r 70243
 
4.3%
M 70243
 
4.3%
i 70243
 
4.3%
u 70243
 
4.3%
Other values (5) 351215
21.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1264374
78.3%
Uppercase Letter 210729
 
13.0%
Space Separator 140486
 
8.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 351215
27.8%
s 210729
16.7%
n 140486
 
11.1%
g 70243
 
5.6%
r 70243
 
5.6%
i 70243
 
5.6%
u 70243
 
5.6%
t 70243
 
5.6%
c 70243
 
5.6%
o 70243
 
5.6%
Uppercase Letter
ValueCountFrequency (%)
D 70243
33.3%
M 70243
33.3%
S 70243
33.3%
Space Separator
ValueCountFrequency (%)
140486
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1475103
91.3%
Common 140486
 
8.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 351215
23.8%
s 210729
14.3%
n 140486
 
9.5%
D 70243
 
4.8%
g 70243
 
4.8%
r 70243
 
4.8%
M 70243
 
4.8%
i 70243
 
4.8%
u 70243
 
4.8%
t 70243
 
4.8%
Other values (4) 280972
19.0%
Common
ValueCountFrequency (%)
140486
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1615589
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 351215
21.7%
s 210729
13.0%
140486
 
8.7%
n 140486
 
8.7%
D 70243
 
4.3%
g 70243
 
4.3%
r 70243
 
4.3%
M 70243
 
4.3%
i 70243
 
4.3%
u 70243
 
4.3%
Other values (5) 351215
21.7%

georeferenceProtocol
Text

Missing 

Distinct19
Distinct (%)0.1%
Missing695012
Missing (%)95.9%
Memory size5.5 MiB
2025-01-08T16:24:12.453404image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length81
Median length43
Mean length42.23633713
Min length7

Characters and Unicode

Total characters1245803
Distinct characters50
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowGeoreferencing Quick Reference Guide (2020)
2nd rowGeoreferencing Quick Reference Guide (2020)
3rd rowGeoreferencing Quick Reference Guide (2020)
4th rowGeoreferencing Quick Reference Guide (2020)
5th rowGeoreferencing Quick Reference Guide (2020)
ValueCountFrequency (%)
georeferencing 26344
17.6%
guide 26344
17.6%
reference 24178
16.2%
2020 24178
16.2%
quick 24178
16.2%
biogeomancer 2166
 
1.4%
2006 2166
 
1.4%
august 2166
 
1.4%
consortium 2166
 
1.4%
for 2166
 
1.4%
Other values (32) 13421
9.0%
2025-01-08T16:24:12.600556image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 237471
19.1%
119977
 
9.6%
r 87730
 
7.0%
i 84069
 
6.7%
n 82720
 
6.6%
c 81302
 
6.5%
u 58822
 
4.7%
G 54854
 
4.4%
0 52731
 
4.2%
f 52688
 
4.2%
Other values (40) 333439
26.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 844245
67.8%
Uppercase Letter 121633
 
9.8%
Space Separator 119977
 
9.6%
Decimal Number 105634
 
8.5%
Open Punctuation 24178
 
1.9%
Close Punctuation 24178
 
1.9%
Other Punctuation 5915
 
0.5%
Math Symbol 43
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 237471
28.1%
r 87730
 
10.4%
i 84069
 
10.0%
n 82720
 
9.8%
c 81302
 
9.6%
u 58822
 
7.0%
f 52688
 
6.2%
o 40962
 
4.9%
g 28625
 
3.4%
d 28111
 
3.3%
Other values (12) 61745
 
7.3%
Uppercase Letter
ValueCountFrequency (%)
G 54854
45.1%
Q 25508
21.0%
R 24645
20.3%
B 4332
 
3.6%
A 3450
 
2.8%
C 2537
 
2.1%
P 2195
 
1.8%
M 1338
 
1.1%
L 1299
 
1.1%
V 351
 
0.3%
Other values (6) 1124
 
0.9%
Decimal Number
ValueCountFrequency (%)
0 52731
49.9%
2 50522
47.8%
6 2166
 
2.1%
5 129
 
0.1%
4 43
 
< 0.1%
8 43
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 3205
54.2%
, 2710
45.8%
Space Separator
ValueCountFrequency (%)
119977
100.0%
Open Punctuation
ValueCountFrequency (%)
( 24178
100.0%
Close Punctuation
ValueCountFrequency (%)
) 24178
100.0%
Math Symbol
ValueCountFrequency (%)
+ 43
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 965878
77.5%
Common 279925
 
22.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 237471
24.6%
r 87730
 
9.1%
i 84069
 
8.7%
n 82720
 
8.6%
c 81302
 
8.4%
u 58822
 
6.1%
G 54854
 
5.7%
f 52688
 
5.5%
o 40962
 
4.2%
g 28625
 
3.0%
Other values (28) 156635
16.2%
Common
ValueCountFrequency (%)
119977
42.9%
0 52731
18.8%
2 50522
18.0%
( 24178
 
8.6%
) 24178
 
8.6%
. 3205
 
1.1%
, 2710
 
1.0%
6 2166
 
0.8%
5 129
 
< 0.1%
4 43
 
< 0.1%
Other values (2) 86
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1245803
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 237471
19.1%
119977
 
9.6%
r 87730
 
7.0%
i 84069
 
6.7%
n 82720
 
6.6%
c 81302
 
6.5%
u 58822
 
4.7%
G 54854
 
4.4%
0 52731
 
4.2%
f 52688
 
4.2%
Other values (40) 333439
26.8%

georeferenceRemarks
Text

Missing 

Distinct2
Distinct (%)40.0%
Missing724503
Missing (%)> 99.9%
Memory size5.5 MiB
2025-01-08T16:24:12.664321image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length70
Median length70
Mean length58
Min length10

Characters and Unicode

Total characters290
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)20.0%

Sample

1st rowA; B; C; D
2nd rowincluded in Jennifer Jett's Foram Bulk DB but not included in F Ledger
3rd rowincluded in Jennifer Jett's Foram Bulk DB but not included in F Ledger
4th rowincluded in Jennifer Jett's Foram Bulk DB but not included in F Ledger
5th rowincluded in Jennifer Jett's Foram Bulk DB but not included in F Ledger
ValueCountFrequency (%)
included 8
14.3%
in 8
14.3%
jennifer 4
7.1%
jett's 4
7.1%
foram 4
7.1%
bulk 4
7.1%
db 4
7.1%
but 4
7.1%
not 4
7.1%
f 4
7.1%
Other values (5) 8
14.3%
2025-01-08T16:24:12.779942image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
51
17.6%
n 28
 
9.7%
e 28
 
9.7%
i 20
 
6.9%
d 20
 
6.9%
u 16
 
5.5%
t 16
 
5.5%
r 12
 
4.1%
l 12
 
4.1%
B 9
 
3.1%
Other values (17) 78
26.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 196
67.6%
Space Separator 51
 
17.6%
Uppercase Letter 36
 
12.4%
Other Punctuation 7
 
2.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 28
14.3%
e 28
14.3%
i 20
10.2%
d 20
10.2%
u 16
8.2%
t 16
8.2%
r 12
6.1%
l 12
6.1%
c 8
 
4.1%
o 8
 
4.1%
Other values (7) 28
14.3%
Uppercase Letter
ValueCountFrequency (%)
B 9
25.0%
J 8
22.2%
F 8
22.2%
D 5
13.9%
L 4
11.1%
A 1
 
2.8%
C 1
 
2.8%
Other Punctuation
ValueCountFrequency (%)
' 4
57.1%
; 3
42.9%
Space Separator
ValueCountFrequency (%)
51
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 232
80.0%
Common 58
 
20.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 28
12.1%
e 28
12.1%
i 20
 
8.6%
d 20
 
8.6%
u 16
 
6.9%
t 16
 
6.9%
r 12
 
5.2%
l 12
 
5.2%
B 9
 
3.9%
J 8
 
3.4%
Other values (14) 63
27.2%
Common
ValueCountFrequency (%)
51
87.9%
' 4
 
6.9%
; 3
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 290
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
51
17.6%
n 28
 
9.7%
e 28
 
9.7%
i 20
 
6.9%
d 20
 
6.9%
u 16
 
5.5%
t 16
 
5.5%
r 12
 
4.1%
l 12
 
4.1%
B 9
 
3.1%
Other values (17) 78
26.9%
Distinct10
Distinct (%)< 0.1%
Missing220036
Missing (%)30.4%
Memory size5.5 MiB
2025-01-08T16:24:12.837368image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length16
Median length8
Mean length8.387123567
Min length8

Characters and Unicode

Total characters4231069
Distinct characters19
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowMesozoic
2nd rowCenozoic
3rd rowCenozoic
4th rowPaleozoic
5th rowCenozoic
ValueCountFrequency (%)
cenozoic 261752
51.9%
paleozoic 194023
38.5%
mesozoic 48343
 
9.6%
precambrian 298
 
0.1%
mesoproterozoic 41
 
< 0.1%
neoproterozoic 7
 
< 0.1%
paleoproterozoic 4
 
< 0.1%
paleoarchean 3
 
< 0.1%
mesoarchean 1
 
< 0.1%
2025-01-08T16:24:12.952137image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 1008448
23.8%
e 504528
11.9%
c 504472
11.9%
i 504468
11.9%
z 504170
11.9%
n 262054
 
6.2%
C 261752
 
6.2%
a 194634
 
4.6%
P 194327
 
4.6%
l 194030
 
4.6%
Other values (9) 98186
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3726598
88.1%
Uppercase Letter 504471
 
11.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 1008448
27.1%
e 504528
13.5%
c 504472
13.5%
i 504468
13.5%
z 504170
13.5%
n 262054
 
7.0%
a 194634
 
5.2%
l 194030
 
5.2%
s 48385
 
1.3%
r 704
 
< 0.1%
Other values (5) 705
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
C 261752
51.9%
P 194327
38.5%
M 48385
 
9.6%
N 7
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 4231069
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 1008448
23.8%
e 504528
11.9%
c 504472
11.9%
i 504468
11.9%
z 504170
11.9%
n 262054
 
6.2%
C 261752
 
6.2%
a 194634
 
4.6%
P 194327
 
4.6%
l 194030
 
4.6%
Other values (9) 98186
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4231069
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 1008448
23.8%
e 504528
11.9%
c 504472
11.9%
i 504468
11.9%
z 504170
11.9%
n 262054
 
6.2%
C 261752
 
6.2%
a 194634
 
4.6%
P 194327
 
4.6%
l 194030
 
4.6%
Other values (9) 98186
 
2.3%
Distinct5
Distinct (%)0.1%
Missing718163
Missing (%)99.1%
Memory size5.5 MiB
2025-01-08T16:24:12.999164image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length15
Median length8
Mean length8.134121355
Min length8

Characters and Unicode

Total characters51611
Distinct characters16
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowPaleozoic
2nd rowCenozoic
3rd rowMesozoic
4th rowCenozoic
5th rowCenozoic
ValueCountFrequency (%)
cenozoic 5229
82.4%
paleozoic 826
 
13.0%
mesozoic 286
 
4.5%
neoproterozoic 3
 
< 0.1%
mesoproterozoic 1
 
< 0.1%
2025-01-08T16:24:13.109315image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 12698
24.6%
e 6349
12.3%
z 6345
12.3%
i 6345
12.3%
c 6345
12.3%
C 5229
10.1%
n 5229
10.1%
P 826
 
1.6%
a 826
 
1.6%
l 826
 
1.6%
Other values (6) 593
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 45266
87.7%
Uppercase Letter 6345
 
12.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 12698
28.1%
e 6349
14.0%
z 6345
14.0%
i 6345
14.0%
c 6345
14.0%
n 5229
11.6%
a 826
 
1.8%
l 826
 
1.8%
s 287
 
0.6%
r 8
 
< 0.1%
Other values (2) 8
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
C 5229
82.4%
P 826
 
13.0%
M 287
 
4.5%
N 3
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 51611
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 12698
24.6%
e 6349
12.3%
z 6345
12.3%
i 6345
12.3%
c 6345
12.3%
C 5229
10.1%
n 5229
10.1%
P 826
 
1.6%
a 826
 
1.6%
l 826
 
1.6%
Other values (6) 593
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 51611
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 12698
24.6%
e 6349
12.3%
z 6345
12.3%
i 6345
12.3%
c 6345
12.3%
C 5229
10.1%
n 5229
10.1%
P 826
 
1.6%
a 826
 
1.6%
l 826
 
1.6%
Other values (6) 593
 
1.1%
Distinct27
Distinct (%)< 0.1%
Missing245750
Missing (%)33.9%
Memory size5.5 MiB
2025-01-08T16:24:13.171828image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length10
Mean length8.607453035
Min length6

Characters and Unicode

Total characters4120887
Distinct characters35
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowTriassic
2nd rowPaleogene
3rd rowNeogene
4th rowPermian
5th rowQuaternary
ValueCountFrequency (%)
paleogene 90464
18.9%
neogene 72075
15.1%
cambrian 48808
10.2%
recent 41336
8.6%
ordovician 34462
 
7.2%
cretaceous 34238
 
7.2%
permian 32455
 
6.8%
quaternary 27798
 
5.8%
devonian 27637
 
5.8%
mississippian 19734
 
4.1%
Other values (14) 49751
10.4%
2025-01-08T16:24:13.299333image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 751141
18.2%
n 506768
12.3%
a 458678
11.1%
i 322536
 
7.8%
o 263741
 
6.4%
r 242986
 
5.9%
g 162539
 
3.9%
s 160613
 
3.9%
P 140533
 
3.4%
c 124669
 
3.0%
Other values (25) 986683
23.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3642156
88.4%
Uppercase Letter 478731
 
11.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 751141
20.6%
n 506768
13.9%
a 458678
12.6%
i 322536
8.9%
o 263741
 
7.2%
r 242986
 
6.7%
g 162539
 
4.5%
s 160613
 
4.4%
c 124669
 
3.4%
l 120100
 
3.3%
Other values (11) 528385
14.5%
Uppercase Letter
ValueCountFrequency (%)
P 140533
29.4%
C 84743
17.7%
N 72075
15.1%
R 41337
 
8.6%
O 34462
 
7.2%
Q 27798
 
5.8%
D 27637
 
5.8%
M 20068
 
4.2%
S 11625
 
2.4%
T 9097
 
1.9%
Other values (4) 9356
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4120887
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 751141
18.2%
n 506768
12.3%
a 458678
11.1%
i 322536
 
7.8%
o 263741
 
6.4%
r 242986
 
5.9%
g 162539
 
3.9%
s 160613
 
3.9%
P 140533
 
3.4%
c 124669
 
3.0%
Other values (25) 986683
23.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4120887
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 751141
18.2%
n 506768
12.3%
a 458678
11.1%
i 322536
 
7.8%
o 263741
 
6.4%
r 242986
 
5.9%
g 162539
 
3.9%
s 160613
 
3.9%
P 140533
 
3.4%
c 124669
 
3.0%
Other values (25) 986683
23.9%
Distinct15
Distinct (%)0.2%
Missing718167
Missing (%)99.1%
Memory size5.5 MiB
2025-01-08T16:24:13.357393image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length10
Mean length8.077905693
Min length6

Characters and Unicode

Total characters51222
Distinct characters28
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowDevonian
2nd rowNeogene
3rd rowCretaceous
4th rowQuaternary
5th rowRecent
ValueCountFrequency (%)
neogene 3161
49.9%
paleogene 1404
22.1%
quaternary 668
 
10.5%
devonian 416
 
6.6%
cretaceous 185
 
2.9%
cambrian 161
 
2.5%
ordovician 137
 
2.2%
pennsylvanian 77
 
1.2%
recent 60
 
0.9%
silurian 30
 
0.5%
Other values (5) 42
 
0.7%
2025-01-08T16:24:13.477350image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 15352
30.0%
n 6768
13.2%
o 5307
 
10.4%
g 4565
 
8.9%
a 4026
 
7.9%
N 3161
 
6.2%
r 1892
 
3.7%
l 1511
 
2.9%
P 1484
 
2.9%
i 1053
 
2.1%
Other values (18) 6103
 
11.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 44881
87.6%
Uppercase Letter 6341
 
12.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 15352
34.2%
n 6768
15.1%
o 5307
 
11.8%
g 4565
 
10.2%
a 4026
 
9.0%
r 1892
 
4.2%
l 1511
 
3.4%
i 1053
 
2.3%
t 914
 
2.0%
u 898
 
2.0%
Other values (8) 2595
 
5.8%
Uppercase Letter
ValueCountFrequency (%)
N 3161
49.9%
P 1484
23.4%
Q 668
 
10.5%
D 416
 
6.6%
C 348
 
5.5%
O 137
 
2.2%
R 60
 
0.9%
S 31
 
0.5%
T 23
 
0.4%
J 13
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 51222
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 15352
30.0%
n 6768
13.2%
o 5307
 
10.4%
g 4565
 
8.9%
a 4026
 
7.9%
N 3161
 
6.2%
r 1892
 
3.7%
l 1511
 
2.9%
P 1484
 
2.9%
i 1053
 
2.1%
Other values (18) 6103
 
11.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 51222
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 15352
30.0%
n 6768
13.2%
o 5307
 
10.4%
g 4565
 
8.9%
a 4026
 
7.9%
N 3161
 
6.2%
r 1892
 
3.7%
l 1511
 
2.9%
P 1484
 
2.9%
i 1053
 
2.1%
Other values (18) 6103
 
11.9%
Distinct24
Distinct (%)< 0.1%
Missing376914
Missing (%)52.0%
Memory size5.5 MiB
2025-01-08T16:24:13.539418image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length11
Mean length6.357434248
Min length1

Characters and Unicode

Total characters2209806
Distinct characters32
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowMiddle
2nd rowEocene
3rd rowPliocene
4th rowPleistocene
5th rowEarly
ValueCountFrequency (%)
middle 68576
19.7%
eocene 66980
19.3%
late 57993
16.7%
miocene 39410
11.3%
early 37474
10.8%
pliocene 32039
9.2%
pleistocene 20013
 
5.8%
oligocene 15521
 
4.5%
paleocene 7752
 
2.2%
holocene 1481
 
0.4%
Other values (10) 355
 
0.1%
2025-01-08T16:24:13.662259image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 520801
23.6%
o 184703
 
8.4%
n 183525
 
8.3%
c 183200
 
8.3%
l 183151
 
8.3%
i 175926
 
8.0%
d 137364
 
6.2%
M 107985
 
4.9%
E 104453
 
4.7%
a 104017
 
4.7%
Other values (22) 324681
14.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1862169
84.3%
Uppercase Letter 347612
 
15.7%
Other Punctuation 25
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 520801
28.0%
o 184703
 
9.9%
n 183525
 
9.9%
c 183200
 
9.8%
l 183151
 
9.8%
i 175926
 
9.4%
d 137364
 
7.4%
a 104017
 
5.6%
t 78031
 
4.2%
r 37590
 
2.0%
Other values (9) 73861
 
4.0%
Uppercase Letter
ValueCountFrequency (%)
M 107985
31.1%
E 104453
30.0%
P 59809
17.2%
L 58036
16.7%
O 15517
 
4.5%
H 1481
 
0.4%
G 195
 
0.1%
C 77
 
< 0.1%
D 27
 
< 0.1%
U 25
 
< 0.1%
Other values (2) 7
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2209781
> 99.9%
Common 25
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 520801
23.6%
o 184703
 
8.4%
n 183525
 
8.3%
c 183200
 
8.3%
l 183151
 
8.3%
i 175926
 
8.0%
d 137364
 
6.2%
M 107985
 
4.9%
E 104453
 
4.7%
a 104017
 
4.7%
Other values (21) 324656
14.7%
Common
ValueCountFrequency (%)
/ 25
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2209806
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 520801
23.6%
o 184703
 
8.4%
n 183525
 
8.3%
c 183200
 
8.3%
l 183151
 
8.3%
i 175926
 
8.0%
d 137364
 
6.2%
M 107985
 
4.9%
E 104453
 
4.7%
a 104017
 
4.7%
Other values (22) 324681
14.7%
Distinct12
Distinct (%)0.2%
Missing718290
Missing (%)99.1%
Memory size5.5 MiB
2025-01-08T16:24:13.717193image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length11
Median length9
Mean length7.33708588
Min length4

Characters and Unicode

Total characters45622
Distinct characters21
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowMiddle
2nd rowPliocene
3rd rowLate
4th rowPleistocene
5th rowMiocene
ValueCountFrequency (%)
pliocene 2384
38.3%
eocene 1075
17.3%
miocene 759
 
12.2%
late 645
 
10.4%
pleistocene 645
 
10.4%
middle 364
 
5.9%
oligocene 188
 
3.0%
paleocene 97
 
1.6%
early 34
 
0.5%
holocene 14
 
0.2%
Other values (2) 13
 
0.2%
2025-01-08T16:24:13.833485image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 12099
26.5%
o 5177
11.3%
n 5176
11.3%
c 5174
11.3%
i 4342
 
9.5%
l 3726
 
8.2%
P 3126
 
6.9%
t 1302
 
2.9%
M 1123
 
2.5%
E 1109
 
2.4%
Other values (11) 3268
 
7.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 39404
86.4%
Uppercase Letter 6218
 
13.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 12099
30.7%
o 5177
13.1%
n 5176
13.1%
c 5174
13.1%
i 4342
 
11.0%
l 3726
 
9.5%
t 1302
 
3.3%
a 777
 
2.0%
d 728
 
1.8%
s 645
 
1.6%
Other values (4) 258
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
P 3126
50.3%
M 1123
 
18.1%
E 1109
 
17.8%
L 646
 
10.4%
O 188
 
3.0%
H 14
 
0.2%
R 12
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 45622
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 12099
26.5%
o 5177
11.3%
n 5176
11.3%
c 5174
11.3%
i 4342
 
9.5%
l 3726
 
8.2%
P 3126
 
6.9%
t 1302
 
2.9%
M 1123
 
2.5%
E 1109
 
2.4%
Other values (11) 3268
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 45622
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 12099
26.5%
o 5177
11.3%
n 5176
11.3%
c 5174
11.3%
i 4342
 
9.5%
l 3726
 
8.2%
P 3126
 
6.9%
t 1302
 
2.9%
M 1123
 
2.5%
E 1109
 
2.4%
Other values (11) 3268
 
7.2%
Distinct366
Distinct (%)0.2%
Missing562472
Missing (%)77.6%
Memory size5.5 MiB
2025-01-08T16:24:14.007290image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length23
Median length19
Mean length9.036053716
Min length4

Characters and Unicode

Total characters1464166
Distinct characters54
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)< 0.1%

Sample

1st rowAnisian
2nd rowHemphillian
3rd rowMiddle
4th rowEmsian
5th rowIrvingtonian
ValueCountFrequency (%)
hemphillian 19681
 
12.1%
middle 17380
 
10.7%
wasatchian 7037
 
4.3%
early 5466
 
3.4%
orellan 5085
 
3.1%
bridgerian 4799
 
2.9%
maastrichtian 4686
 
2.9%
campanian 4051
 
2.5%
chadronian 3871
 
2.4%
ypresian 3476
 
2.1%
Other values (350) 87399
53.6%
2025-01-08T16:24:14.259494image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 228885
15.6%
n 195907
13.4%
i 190767
13.0%
e 105142
 
7.2%
l 96307
 
6.6%
r 75689
 
5.2%
d 61340
 
4.2%
o 52724
 
3.6%
h 47497
 
3.2%
s 40454
 
2.8%
Other values (44) 369454
25.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1300773
88.8%
Uppercase Letter 162483
 
11.1%
Space Separator 895
 
0.1%
Other Punctuation 13
 
< 0.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 228885
17.6%
n 195907
15.1%
i 190767
14.7%
e 105142
8.1%
l 96307
7.4%
r 75689
 
5.8%
d 61340
 
4.7%
o 52724
 
4.1%
h 47497
 
3.7%
s 40454
 
3.1%
Other values (16) 206061
15.8%
Uppercase Letter
ValueCountFrequency (%)
M 28152
17.3%
C 21480
13.2%
H 20672
12.7%
W 12315
7.6%
B 10522
 
6.5%
O 10358
 
6.4%
T 8937
 
5.5%
E 7395
 
4.6%
A 6493
 
4.0%
L 6455
 
4.0%
Other values (14) 29704
18.3%
Other Punctuation
ValueCountFrequency (%)
/ 12
92.3%
, 1
 
7.7%
Space Separator
ValueCountFrequency (%)
895
100.0%
Decimal Number
ValueCountFrequency (%)
4 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1463256
99.9%
Common 910
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 228885
15.6%
n 195907
13.4%
i 190767
13.0%
e 105142
 
7.2%
l 96307
 
6.6%
r 75689
 
5.2%
d 61340
 
4.2%
o 52724
 
3.6%
h 47497
 
3.2%
s 40454
 
2.8%
Other values (40) 368544
25.2%
Common
ValueCountFrequency (%)
895
98.4%
/ 12
 
1.3%
4 2
 
0.2%
, 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1464166
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 228885
15.6%
n 195907
13.4%
i 190767
13.0%
e 105142
 
7.2%
l 96307
 
6.6%
r 75689
 
5.2%
d 61340
 
4.2%
o 52724
 
3.6%
h 47497
 
3.2%
s 40454
 
2.8%
Other values (44) 369454
25.2%
Distinct35
Distinct (%)1.5%
Missing722133
Missing (%)99.7%
Memory size5.5 MiB
2025-01-08T16:24:14.339926image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length8
Mean length8.232
Min length4

Characters and Unicode

Total characters19551
Distinct characters38
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)0.2%

Sample

1st rowGivetian
2nd rowTuronian
3rd rowGelasian
4th rowGelasian
5th rowGelasian
ValueCountFrequency (%)
lutetian 829
34.9%
zanclean 319
 
13.4%
tortonian 217
 
9.1%
gelasian 200
 
8.4%
maastrichtian 105
 
4.4%
late 98
 
4.1%
messinian 78
 
3.3%
thanetian 78
 
3.3%
ypresian 60
 
2.5%
langhian 58
 
2.4%
Other values (25) 333
14.0%
2025-01-08T16:24:14.465657image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 3358
17.2%
n 3107
15.9%
t 2287
11.7%
i 2268
11.6%
e 1838
9.4%
L 1015
 
5.2%
u 862
 
4.4%
l 662
 
3.4%
o 553
 
2.8%
s 534
 
2.7%
Other values (28) 3067
15.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17176
87.9%
Uppercase Letter 2375
 
12.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 3358
19.6%
n 3107
18.1%
t 2287
13.3%
i 2268
13.2%
e 1838
10.7%
u 862
 
5.0%
l 662
 
3.9%
o 553
 
3.2%
s 534
 
3.1%
r 515
 
3.0%
Other values (13) 1192
 
6.9%
Uppercase Letter
ValueCountFrequency (%)
L 1015
42.7%
Z 319
 
13.4%
T 297
 
12.5%
G 223
 
9.4%
M 196
 
8.3%
E 90
 
3.8%
Y 60
 
2.5%
P 53
 
2.2%
C 50
 
2.1%
B 32
 
1.3%
Other values (5) 40
 
1.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 19551
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 3358
17.2%
n 3107
15.9%
t 2287
11.7%
i 2268
11.6%
e 1838
9.4%
L 1015
 
5.2%
u 862
 
4.4%
l 662
 
3.4%
o 553
 
2.8%
s 534
 
2.7%
Other values (28) 3067
15.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 19551
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 3358
17.2%
n 3107
15.9%
t 2287
11.7%
i 2268
11.6%
e 1838
9.4%
L 1015
 
5.2%
u 862
 
4.4%
l 662
 
3.4%
o 553
 
2.8%
s 534
 
2.7%
Other values (28) 3067
15.7%

group
Text

Missing 

Distinct557
Distinct (%)0.6%
Missing633218
Missing (%)87.4%
Memory size5.5 MiB
2025-01-08T16:24:14.649633image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length29
Median length28
Mean length14.80891664
Min length1

Characters and Unicode

Total characters1351906
Distinct characters57
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique146 ?
Unique (%)0.2%

Sample

1st rowStar Peak Group
2nd rowChesapeake Group
3rd rowKeokuk Group
4th rowChesapeake Group
5th rowChesapeake Group
ValueCountFrequency (%)
group 90331
46.7%
chesapeake 38410
19.9%
river 7802
 
4.0%
white 5751
 
3.0%
selma 3439
 
1.8%
kewanee 2702
 
1.4%
hamilton 2337
 
1.2%
osage 2256
 
1.2%
washita 1421
 
0.7%
pamunkey 1419
 
0.7%
Other values (577) 37508
19.4%
2025-01-08T16:24:14.908645image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 166874
12.3%
p 131366
9.7%
a 118438
 
8.8%
r 115845
 
8.6%
o 113583
 
8.4%
102086
 
7.6%
u 98547
 
7.3%
G 90741
 
6.7%
s 54633
 
4.0%
h 50628
 
3.7%
Other values (47) 309165
22.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1056168
78.1%
Uppercase Letter 193474
 
14.3%
Space Separator 102086
 
7.6%
Other Punctuation 124
 
< 0.1%
Open Punctuation 21
 
< 0.1%
Close Punctuation 21
 
< 0.1%
Dash Punctuation 12
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 166874
15.8%
p 131366
12.4%
a 118438
11.2%
r 115845
11.0%
o 113583
10.8%
u 98547
9.3%
s 54633
 
5.2%
h 50628
 
4.8%
k 45139
 
4.3%
i 34291
 
3.2%
Other values (16) 126824
12.0%
Uppercase Letter
ValueCountFrequency (%)
G 90741
46.9%
C 43143
22.3%
R 9045
 
4.7%
W 8105
 
4.2%
S 6248
 
3.2%
M 4589
 
2.4%
P 4340
 
2.2%
K 3671
 
1.9%
O 3592
 
1.9%
H 3351
 
1.7%
Other values (15) 16649
 
8.6%
Other Punctuation
ValueCountFrequency (%)
. 88
71.0%
, 36
29.0%
Space Separator
ValueCountFrequency (%)
102086
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1249642
92.4%
Common 102264
 
7.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 166874
13.4%
p 131366
10.5%
a 118438
9.5%
r 115845
9.3%
o 113583
9.1%
u 98547
 
7.9%
G 90741
 
7.3%
s 54633
 
4.4%
h 50628
 
4.1%
k 45139
 
3.6%
Other values (41) 263848
21.1%
Common
ValueCountFrequency (%)
102086
99.8%
. 88
 
0.1%
, 36
 
< 0.1%
( 21
 
< 0.1%
) 21
 
< 0.1%
- 12
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1351906
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 166874
12.3%
p 131366
9.7%
a 118438
 
8.8%
r 115845
 
8.6%
o 113583
 
8.4%
102086
 
7.6%
u 98547
 
7.3%
G 90741
 
6.7%
s 54633
 
4.0%
h 50628
 
3.7%
Other values (47) 309165
22.9%

formation
Text

Missing 

Distinct5419
Distinct (%)1.5%
Missing365706
Missing (%)50.5%
Memory size5.5 MiB
2025-01-08T16:24:15.098349image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length46
Median length38
Mean length11.49027319
Min length3

Characters and Unicode

Total characters4122733
Distinct characters66
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1482 ?
Unique (%)0.4%

Sample

1st rowPrida Fm
2nd rowYorktown Fm
3rd rowSkinner Ranch Fm
4th rowSan Pedro Fm
5th rowGrande Greve Fm
ValueCountFrequency (%)
fm 259134
32.0%
river 44301
 
5.5%
ls 39737
 
4.9%
stephen 31376
 
3.9%
green 29207
 
3.6%
yorktown 23754
 
2.9%
unknown 18762
 
2.3%
sh 17735
 
2.2%
pungo 10262
 
1.3%
canyon 8111
 
1.0%
Other values (4425) 326422
40.4%
2025-01-08T16:24:15.350834image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
449999
 
10.9%
e 361227
 
8.8%
n 317355
 
7.7%
m 288475
 
7.0%
F 271104
 
6.6%
r 245377
 
6.0%
o 238913
 
5.8%
a 212844
 
5.2%
i 166070
 
4.0%
t 160119
 
3.9%
Other values (56) 1411250
34.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2858690
69.3%
Uppercase Letter 809683
 
19.6%
Space Separator 449999
 
10.9%
Other Punctuation 3867
 
0.1%
Decimal Number 156
 
< 0.1%
Open Punctuation 135
 
< 0.1%
Close Punctuation 134
 
< 0.1%
Dash Punctuation 69
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 361227
12.6%
n 317355
11.1%
m 288475
10.1%
r 245377
 
8.6%
o 238913
 
8.4%
a 212844
 
7.4%
i 166070
 
5.8%
t 160119
 
5.6%
l 128749
 
4.5%
s 112733
 
3.9%
Other values (16) 626828
21.9%
Uppercase Letter
ValueCountFrequency (%)
F 271104
33.5%
S 78359
 
9.7%
R 63222
 
7.8%
L 61354
 
7.6%
C 52642
 
6.5%
G 37852
 
4.7%
B 36649
 
4.5%
M 26756
 
3.3%
P 26718
 
3.3%
Y 24537
 
3.0%
Other values (15) 130490
16.1%
Other Punctuation
ValueCountFrequency (%)
. 2426
62.7%
, 703
 
18.2%
? 651
 
16.8%
' 64
 
1.7%
/ 19
 
0.5%
" 4
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 147
94.2%
3 3
 
1.9%
9 2
 
1.3%
2 2
 
1.3%
0 2
 
1.3%
Space Separator
ValueCountFrequency (%)
449999
100.0%
Open Punctuation
ValueCountFrequency (%)
( 135
100.0%
Close Punctuation
ValueCountFrequency (%)
) 134
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 69
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3668373
89.0%
Common 454360
 
11.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 361227
 
9.8%
n 317355
 
8.7%
m 288475
 
7.9%
F 271104
 
7.4%
r 245377
 
6.7%
o 238913
 
6.5%
a 212844
 
5.8%
i 166070
 
4.5%
t 160119
 
4.4%
l 128749
 
3.5%
Other values (41) 1278140
34.8%
Common
ValueCountFrequency (%)
449999
99.0%
. 2426
 
0.5%
, 703
 
0.2%
? 651
 
0.1%
1 147
 
< 0.1%
( 135
 
< 0.1%
) 134
 
< 0.1%
- 69
 
< 0.1%
' 64
 
< 0.1%
/ 19
 
< 0.1%
Other values (5) 13
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4122733
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
449999
 
10.9%
e 361227
 
8.8%
n 317355
 
7.7%
m 288475
 
7.0%
F 271104
 
6.6%
r 245377
 
6.0%
o 238913
 
5.8%
a 212844
 
5.2%
i 166070
 
4.0%
t 160119
 
3.9%
Other values (56) 1411250
34.2%

member
Text

Missing 

Distinct1626
Distinct (%)2.0%
Missing643191
Missing (%)88.8%
Memory size5.5 MiB
2025-01-08T16:24:15.541291image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length31
Median length30
Mean length13.99831524
Min length1

Characters and Unicode

Total characters1138301
Distinct characters70
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique471 ?
Unique (%)0.6%

Sample

1st rowFossil Hill Mbr
2nd rowDecie Ranch Mbr
3rd rowMillersburg Mbr
4th rowThin-Bedded Zone Of Udden
5th rowBurgess Sh Mbr
ValueCountFrequency (%)
mbr 79698
34.1%
sh 36967
15.8%
burgess 30811
 
13.2%
ls 6535
 
2.8%
creek 4230
 
1.8%
sunken 3525
 
1.5%
meadow 3525
 
1.5%
ranch 3361
 
1.4%
francis 2603
 
1.1%
b 2492
 
1.1%
Other values (1500) 60135
25.7%
2025-01-08T16:24:15.796453image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
152565
13.4%
r 138201
12.1%
M 87327
 
7.7%
s 86157
 
7.6%
b 84523
 
7.4%
e 79157
 
7.0%
h 47967
 
4.2%
S 46866
 
4.1%
u 42615
 
3.7%
a 41195
 
3.6%
Other values (60) 331728
29.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 749978
65.9%
Uppercase Letter 232978
 
20.5%
Space Separator 152565
 
13.4%
Decimal Number 2131
 
0.2%
Other Punctuation 324
 
< 0.1%
Dash Punctuation 290
 
< 0.1%
Open Punctuation 17
 
< 0.1%
Close Punctuation 17
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 138201
18.4%
s 86157
11.5%
b 84523
11.3%
e 79157
10.6%
h 47967
 
6.4%
u 42615
 
5.7%
a 41195
 
5.5%
g 38517
 
5.1%
n 36464
 
4.9%
i 27554
 
3.7%
Other values (16) 127628
17.0%
Uppercase Letter
ValueCountFrequency (%)
M 87327
37.5%
S 46866
20.1%
B 39596
17.0%
C 10761
 
4.6%
L 9429
 
4.0%
R 5451
 
2.3%
F 4926
 
2.1%
P 4323
 
1.9%
G 4164
 
1.8%
W 4116
 
1.8%
Other values (15) 16019
 
6.9%
Decimal Number
ValueCountFrequency (%)
1 858
40.3%
2 337
 
15.8%
3 289
 
13.6%
4 247
 
11.6%
5 130
 
6.1%
0 124
 
5.8%
6 102
 
4.8%
7 24
 
1.1%
9 16
 
0.8%
8 4
 
0.2%
Other Punctuation
ValueCountFrequency (%)
, 131
40.4%
. 128
39.5%
? 64
19.8%
' 1
 
0.3%
Space Separator
ValueCountFrequency (%)
152565
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 290
100.0%
Open Punctuation
ValueCountFrequency (%)
( 17
100.0%
Close Punctuation
ValueCountFrequency (%)
) 17
100.0%
Math Symbol
ValueCountFrequency (%)
= 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 982956
86.4%
Common 155345
 
13.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 138201
14.1%
M 87327
 
8.9%
s 86157
 
8.8%
b 84523
 
8.6%
e 79157
 
8.1%
h 47967
 
4.9%
S 46866
 
4.8%
u 42615
 
4.3%
a 41195
 
4.2%
B 39596
 
4.0%
Other values (41) 289352
29.4%
Common
ValueCountFrequency (%)
152565
98.2%
1 858
 
0.6%
2 337
 
0.2%
- 290
 
0.2%
3 289
 
0.2%
4 247
 
0.2%
, 131
 
0.1%
5 130
 
0.1%
. 128
 
0.1%
0 124
 
0.1%
Other values (9) 246
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1138301
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
152565
13.4%
r 138201
12.1%
M 87327
 
7.7%
s 86157
 
7.6%
b 84523
 
7.4%
e 79157
 
7.0%
h 47967
 
4.2%
S 46866
 
4.1%
u 42615
 
3.7%
a 41195
 
3.6%
Other values (60) 331728
29.1%

typeStatus
Text

Missing 

Distinct15
Distinct (%)< 0.1%
Missing582086
Missing (%)80.3%
Memory size5.5 MiB
2025-01-08T16:24:15.858101image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length15
Median length8
Mean length7.803239668
Min length4

Characters and Unicode

Total characters1111353
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPARATYPE
2nd rowPARATYPE
3rd rowPARATYPE
4th rowTYPE
5th rowHOLOTYPE
ValueCountFrequency (%)
paratype 74612
52.4%
holotype 34645
24.3%
syntype 19534
 
13.7%
type 7903
 
5.5%
paralectotype 2966
 
2.1%
lectotype 1051
 
0.7%
plastoholotype 593
 
0.4%
plastotype 389
 
0.3%
plastoparatype 282
 
0.2%
plastosyntype 253
 
0.2%
Other values (5) 194
 
0.1%
2025-01-08T16:24:15.980130image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
P 221818
20.0%
Y 162209
14.6%
A 157256
14.1%
T 147992
13.3%
E 146600
13.2%
R 77860
 
7.0%
O 76223
 
6.9%
L 40808
 
3.7%
H 35238
 
3.2%
S 21351
 
1.9%
Other values (3) 23998
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1111353
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P 221818
20.0%
Y 162209
14.6%
A 157256
14.1%
T 147992
13.3%
E 146600
13.2%
R 77860
 
7.0%
O 76223
 
6.9%
L 40808
 
3.7%
H 35238
 
3.2%
S 21351
 
1.9%
Other values (3) 23998
 
2.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 1111353
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
P 221818
20.0%
Y 162209
14.6%
A 157256
14.1%
T 147992
13.3%
E 146600
13.2%
R 77860
 
7.0%
O 76223
 
6.9%
L 40808
 
3.7%
H 35238
 
3.2%
S 21351
 
1.9%
Other values (3) 23998
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1111353
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
P 221818
20.0%
Y 162209
14.6%
A 157256
14.1%
T 147992
13.3%
E 146600
13.2%
R 77860
 
7.0%
O 76223
 
6.9%
L 40808
 
3.7%
H 35238
 
3.2%
S 21351
 
1.9%
Other values (3) 23998
 
2.2%

identifiedBy
Text

Missing 

Distinct2463
Distinct (%)1.2%
Missing521981
Missing (%)72.0%
Memory size5.5 MiB
2025-01-08T16:24:16.171687image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length147
Median length124
Mean length22.47668212
Min length2

Characters and Unicode

Total characters4552135
Distinct characters68
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique535 ?
Unique (%)0.3%

Sample

1st rowSilberling; Nichols
2nd rowVaughan
3rd rowHarper; Boucot
4th rowSaid; Barakat, M. G.
5th rowSmith
ValueCountFrequency (%)
united 21468
 
3.2%
states 21082
 
3.2%
of 20281
 
3.1%
museum 15734
 
2.4%
helen 15316
 
2.3%
12006
 
1.8%
natural 11887
 
1.8%
history 11620
 
1.8%
institution 11572
 
1.7%
smithsonian 11571
 
1.7%
Other values (2466) 510240
77.0%
2025-01-08T16:24:16.452343image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
460250
 
10.1%
e 280098
 
6.2%
o 272102
 
6.0%
a 259642
 
5.7%
n 241275
 
5.3%
t 230888
 
5.1%
r 226036
 
5.0%
i 214007
 
4.7%
l 181066
 
4.0%
s 174306
 
3.8%
Other values (58) 2012465
44.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2806351
61.6%
Uppercase Letter 908175
 
20.0%
Space Separator 460250
 
10.1%
Other Punctuation 280258
 
6.2%
Close Punctuation 40168
 
0.9%
Open Punctuation 40168
 
0.9%
Dash Punctuation 16765
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 280098
10.0%
o 272102
9.7%
a 259642
9.3%
n 241275
 
8.6%
t 230888
 
8.2%
r 226036
 
8.1%
i 214007
 
7.6%
l 181066
 
6.5%
s 174306
 
6.2%
u 121224
 
4.3%
Other values (22) 605707
21.6%
Uppercase Letter
ValueCountFrequency (%)
S 117932
 
13.0%
T 78022
 
8.6%
A 60143
 
6.6%
N 59104
 
6.5%
C 57622
 
6.3%
E 56100
 
6.2%
I 46266
 
5.1%
D 44046
 
4.8%
H 42705
 
4.7%
U 40270
 
4.4%
Other values (16) 305965
33.7%
Other Punctuation
ValueCountFrequency (%)
, 138675
49.5%
. 77116
27.5%
; 64257
22.9%
/ 177
 
0.1%
' 23
 
< 0.1%
& 10
 
< 0.1%
Space Separator
ValueCountFrequency (%)
460250
100.0%
Close Punctuation
ValueCountFrequency (%)
) 40168
100.0%
Open Punctuation
ValueCountFrequency (%)
( 40168
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 16765
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3714526
81.6%
Common 837609
 
18.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 280098
 
7.5%
o 272102
 
7.3%
a 259642
 
7.0%
n 241275
 
6.5%
t 230888
 
6.2%
r 226036
 
6.1%
i 214007
 
5.8%
l 181066
 
4.9%
s 174306
 
4.7%
u 121224
 
3.3%
Other values (48) 1513882
40.8%
Common
ValueCountFrequency (%)
460250
54.9%
, 138675
 
16.6%
. 77116
 
9.2%
; 64257
 
7.7%
) 40168
 
4.8%
( 40168
 
4.8%
- 16765
 
2.0%
/ 177
 
< 0.1%
' 23
 
< 0.1%
& 10
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4550350
> 99.9%
None 1785
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
460250
 
10.1%
e 280098
 
6.2%
o 272102
 
6.0%
a 259642
 
5.7%
n 241275
 
5.3%
t 230888
 
5.1%
r 226036
 
5.0%
i 214007
 
4.7%
l 181066
 
4.0%
s 174306
 
3.8%
Other values (52) 2010680
44.2%
None
ValueCountFrequency (%)
ñ 1143
64.0%
ý 251
 
14.1%
š 251
 
14.1%
ö 138
 
7.7%
ú 1
 
0.1%
í 1
 
0.1%

acceptedNameUsageID
Real number (ℝ)

Missing 

Distinct58335
Distinct (%)10.6%
Missing171789
Missing (%)23.7%
Infinite0
Infinite (%)0.0%
Mean5515085.25
Minimum1
Maximum12385426
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:16.532315image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile216
Q13249393
median4941659
Q38513230
95-th percentile9626241
Maximum12385426
Range12385425
Interquartile range (IQR)5263837

Descriptive statistics

Standard deviation3184869.125
Coefficient of variation (CV)0.5774832084
Kurtosis-0.8885403732
Mean5515085.25
Median Absolute Deviation (MAD)2688948
Skewness-0.1769613565
Sum3.048292404 × 1012
Variance1.014339134 × 1013
MonotonicityNot monotonic
2025-01-08T16:24:16.605261image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
216 16872
 
2.3%
8513230 13693
 
1.9%
4806028 12281
 
1.7%
6 11457
 
1.6%
359 4656
 
0.6%
44 4268
 
0.6%
729 3674
 
0.5%
353 3566
 
0.5%
2481460 3232
 
0.4%
4832444 3022
 
0.4%
Other values (58325) 475998
65.7%
(Missing) 171789
 
23.7%
ValueCountFrequency (%)
1 1114
 
0.2%
6 11457
1.6%
42 952
 
0.1%
43 51
 
< 0.1%
44 4268
 
0.6%
ValueCountFrequency (%)
12385426 4
 
< 0.1%
12385220 2
 
< 0.1%
12379591 6
 
< 0.1%
12362277 15
< 0.1%
12358726 5
 
< 0.1%
Distinct65364
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:16.806506image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length124
Median length82
Mean length24.76860849
Min length3

Characters and Unicode

Total characters17945055
Distinct characters109
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24744 ?
Unique (%)3.4%

Sample

1st rowincertae sedis
2nd rowDamaliscus lunatus (Burchell, 1823)
3rd rowAcrochordiceras hyatti Meek, 1877
4th rowDiscocyclina sculpturata (Cushman, 1919)
5th rowOdontaspis cuspidata (Agassiz, 1843)
ValueCountFrequency (%)
incertae 171789
 
7.4%
sedis 171789
 
7.4%
80645
 
3.5%
walcott 31003
 
1.3%
cooper 24261
 
1.1%
cushman 17003
 
0.7%
insecta 16882
 
0.7%
1912 16564
 
0.7%
grant 16169
 
0.7%
1976 14713
 
0.6%
Other values (47365) 1749493
75.7%
2025-01-08T16:24:17.213589image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1585803
 
8.8%
e 1485632
 
8.3%
a 1415466
 
7.9%
i 1243670
 
6.9%
s 1115918
 
6.2%
r 978896
 
5.5%
n 888307
 
5.0%
o 817782
 
4.6%
t 774995
 
4.3%
l 698421
 
3.9%
Other values (99) 6940165
38.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12560571
70.0%
Decimal Number 1825392
 
10.2%
Space Separator 1585803
 
8.8%
Uppercase Letter 1178126
 
6.6%
Other Punctuation 581846
 
3.2%
Close Punctuation 105296
 
0.6%
Open Punctuation 105296
 
0.6%
Dash Punctuation 2722
 
< 0.1%
Math Symbol 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1485632
11.8%
a 1415466
11.3%
i 1243670
9.9%
s 1115918
8.9%
r 978896
 
7.8%
n 888307
 
7.1%
o 817782
 
6.5%
t 774995
 
6.2%
l 698421
 
5.6%
c 591840
 
4.7%
Other values (47) 2549644
20.3%
Uppercase Letter
ValueCountFrequency (%)
C 146701
12.5%
P 100181
 
8.5%
S 99223
 
8.4%
B 93794
 
8.0%
M 79260
 
6.7%
G 78038
 
6.6%
W 66342
 
5.6%
A 64563
 
5.5%
L 63670
 
5.4%
H 59251
 
5.0%
Other values (22) 327103
27.8%
Decimal Number
ValueCountFrequency (%)
1 538747
29.5%
9 311438
17.1%
8 283152
15.5%
7 137355
 
7.5%
6 113287
 
6.2%
5 102651
 
5.6%
2 99760
 
5.5%
3 91117
 
5.0%
4 74403
 
4.1%
0 73482
 
4.0%
Other Punctuation
ValueCountFrequency (%)
, 460573
79.2%
& 80645
 
13.9%
. 33035
 
5.7%
' 7591
 
1.3%
? 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
1585803
100.0%
Close Punctuation
ValueCountFrequency (%)
) 105296
100.0%
Open Punctuation
ValueCountFrequency (%)
( 105296
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2722
100.0%
Math Symbol
ValueCountFrequency (%)
× 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 13738697
76.6%
Common 4206358
 
23.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1485632
 
10.8%
a 1415466
 
10.3%
i 1243670
 
9.1%
s 1115918
 
8.1%
r 978896
 
7.1%
n 888307
 
6.5%
o 817782
 
6.0%
t 774995
 
5.6%
l 698421
 
5.1%
c 591840
 
4.3%
Other values (79) 3727770
27.1%
Common
ValueCountFrequency (%)
1585803
37.7%
1 538747
 
12.8%
, 460573
 
10.9%
9 311438
 
7.4%
8 283152
 
6.7%
7 137355
 
3.3%
6 113287
 
2.7%
) 105296
 
2.5%
( 105296
 
2.5%
5 102651
 
2.4%
Other values (10) 462760
 
11.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17931313
99.9%
None 13742
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1585803
 
8.8%
e 1485632
 
8.3%
a 1415466
 
7.9%
i 1243670
 
6.9%
s 1115918
 
6.2%
r 978896
 
5.5%
n 888307
 
5.0%
o 817782
 
4.6%
t 774995
 
4.3%
l 698421
 
3.9%
Other values (61) 6926423
38.6%
None
ValueCountFrequency (%)
ü 3637
26.5%
ö 2722
19.8%
è 2108
15.3%
é 2051
14.9%
ú 1773
12.9%
ã 292
 
2.1%
ë 259
 
1.9%
ž 160
 
1.2%
ä 153
 
1.1%
å 121
 
0.9%
Other values (28) 466
 
3.4%

higherClassification
Text

Missing 

Distinct3844
Distinct (%)0.7%
Missing172643
Missing (%)23.8%
Memory size5.5 MiB
2025-01-08T16:24:17.394044image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length141
Median length123
Mean length59.08444638
Min length5

Characters and Unicode

Total characters32606638
Distinct characters61
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique743 ?
Unique (%)0.1%

Sample

1st rowAnimalia, Chordata, Vertebrata, Mammalia, Eutheria, Laurasiatheria, Artiodactyla, Ruminatia, Bovidae
2nd rowAnimalia, Mollusca, Cephalopoda, Ammonoidea
3rd rowChromista, Foraminifera, Globothalamea, Rotaliida, Discocyclinidae
4th rowAnimalia, Chordata, Vertebrata, Pisces, Chondrichthyes, Elasmobranchii, Galeomorphii, Lamniformes, Odontaspididae
5th rowAnimalia, Brachiopoda, Rhynchonellata, Orthida, Enteletidae
ValueCountFrequency (%)
animalia 448323
 
15.7%
chordata 148700
 
5.2%
vertebrata 148618
 
5.2%
arthropoda 100318
 
3.5%
mollusca 69025
 
2.4%
brachiopoda 66748
 
2.3%
foraminifera 66301
 
2.3%
chromista 65999
 
2.3%
mammalia 60027
 
2.1%
eutheria 57586
 
2.0%
Other values (3834) 1620986
56.8%
2025-01-08T16:24:17.649283image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 4706865
14.4%
i 3184420
 
9.8%
2300766
 
7.1%
, 2260526
 
6.9%
o 2052009
 
6.3%
r 2005114
 
6.1%
e 1809015
 
5.5%
t 1671086
 
5.1%
l 1501858
 
4.6%
n 1400746
 
4.3%
Other values (51) 9714233
29.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 25197474
77.3%
Uppercase Letter 2811914
 
8.6%
Space Separator 2300766
 
7.1%
Other Punctuation 2295928
 
7.0%
Decimal Number 471
 
< 0.1%
Open Punctuation 42
 
< 0.1%
Close Punctuation 42
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 4706865
18.7%
i 3184420
12.6%
o 2052009
8.1%
r 2005114
8.0%
e 1809015
 
7.2%
t 1671086
 
6.6%
l 1501858
 
6.0%
n 1400746
 
5.6%
d 1257138
 
5.0%
m 1113235
 
4.4%
Other values (16) 4495988
17.8%
Uppercase Letter
ValueCountFrequency (%)
A 662527
23.6%
C 427513
15.2%
P 199516
 
7.1%
M 161377
 
5.7%
V 161299
 
5.7%
S 144831
 
5.2%
E 143204
 
5.1%
R 141162
 
5.0%
B 123534
 
4.4%
G 116236
 
4.1%
Other values (16) 530715
18.9%
Other Punctuation
ValueCountFrequency (%)
, 2260526
98.5%
. 35391
 
1.5%
" 8
 
< 0.1%
? 3
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2300766
100.0%
Decimal Number
ValueCountFrequency (%)
0 471
100.0%
Open Punctuation
ValueCountFrequency (%)
( 42
100.0%
Close Punctuation
ValueCountFrequency (%)
) 42
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 28009388
85.9%
Common 4597250
 
14.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 4706865
16.8%
i 3184420
11.4%
o 2052009
 
7.3%
r 2005114
 
7.2%
e 1809015
 
6.5%
t 1671086
 
6.0%
l 1501858
 
5.4%
n 1400746
 
5.0%
d 1257138
 
4.5%
m 1113235
 
4.0%
Other values (42) 7307902
26.1%
Common
ValueCountFrequency (%)
2300766
50.0%
, 2260526
49.2%
. 35391
 
0.8%
0 471
 
< 0.1%
( 42
 
< 0.1%
) 42
 
< 0.1%
" 8
 
< 0.1%
? 3
 
< 0.1%
- 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32606638
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 4706865
14.4%
i 3184420
 
9.8%
2300766
 
7.1%
, 2260526
 
6.9%
o 2052009
 
6.3%
r 2005114
 
6.1%
e 1809015
 
5.5%
t 1671086
 
5.1%
l 1501858
 
4.6%
n 1400746
 
4.3%
Other values (51) 9714233
29.8%
Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:17.709120image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length14
Median length8
Mean length9.46887543
Min length5

Characters and Unicode

Total characters6860276
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowincertae sedis
2nd rowAnimalia
3rd rowAnimalia
4th rowChromista
5th rowAnimalia
ValueCountFrequency (%)
animalia 446288
49.8%
incertae 171929
 
19.2%
sedis 171929
 
19.2%
chromista 69124
 
7.7%
plantae 36324
 
4.1%
bacteria 502
 
0.1%
protozoa 287
 
< 0.1%
fungi 54
 
< 0.1%
2025-01-08T16:24:17.818849image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 1306114
19.0%
a 1207568
17.6%
n 654595
9.5%
e 552613
8.1%
m 515412
 
7.5%
l 482612
 
7.0%
A 446288
 
6.5%
s 412982
 
6.0%
t 278166
 
4.1%
r 241842
 
3.5%
Other values (12) 762084
11.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6135768
89.4%
Uppercase Letter 552579
 
8.1%
Space Separator 171929
 
2.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 1306114
21.3%
a 1207568
19.7%
n 654595
10.7%
e 552613
9.0%
m 515412
 
8.4%
l 482612
 
7.9%
s 412982
 
6.7%
t 278166
 
4.5%
r 241842
 
3.9%
c 172431
 
2.8%
Other values (6) 311433
 
5.1%
Uppercase Letter
ValueCountFrequency (%)
A 446288
80.8%
C 69124
 
12.5%
P 36611
 
6.6%
B 502
 
0.1%
F 54
 
< 0.1%
Space Separator
ValueCountFrequency (%)
171929
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6688347
97.5%
Common 171929
 
2.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 1306114
19.5%
a 1207568
18.1%
n 654595
9.8%
e 552613
8.3%
m 515412
 
7.7%
l 482612
 
7.2%
A 446288
 
6.7%
s 412982
 
6.2%
t 278166
 
4.2%
r 241842
 
3.6%
Other values (11) 590155
8.8%
Common
ValueCountFrequency (%)
171929
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6860276
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 1306114
19.0%
a 1207568
17.6%
n 654595
9.5%
e 552613
8.1%
m 515412
 
7.5%
l 482612
 
7.0%
A 446288
 
6.5%
s 412982
 
6.0%
t 278166
 
4.1%
r 241842
 
3.5%
Other values (12) 762084
11.1%

phylum
Text

Missing 

Distinct40
Distinct (%)< 0.1%
Missing192842
Missing (%)26.6%
Memory size5.5 MiB
2025-01-08T16:24:17.883498image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length17
Median length16
Mean length9.682191451
Min length7

Characters and Unicode

Total characters5147692
Distinct characters35
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowChordata
2nd rowMollusca
3rd rowForaminifera
4th rowChordata
5th rowBrachiopoda
ValueCountFrequency (%)
chordata 148527
27.9%
arthropoda 101505
19.1%
mollusca 66708
12.5%
foraminifera 66099
12.4%
brachiopoda 65633
12.3%
echinodermata 27100
 
5.1%
tracheophyta 21340
 
4.0%
bryozoa 13677
 
2.6%
cnidaria 6914
 
1.3%
annelida 3027
 
0.6%
Other values (30) 11136
 
2.1%
2025-01-08T16:24:18.010070image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 872631
17.0%
o 702054
13.6%
r 631416
12.3%
h 394973
 
7.7%
d 356870
 
6.9%
t 305449
 
5.9%
i 250365
 
4.9%
p 194047
 
3.8%
c 186323
 
3.6%
C 156569
 
3.0%
Other values (25) 1096995
21.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4616026
89.7%
Uppercase Letter 531666
 
10.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 872631
18.9%
o 702054
15.2%
r 631416
13.7%
h 394973
8.6%
d 356870
7.7%
t 305449
 
6.6%
i 250365
 
5.4%
p 194047
 
4.2%
c 186323
 
4.0%
l 138356
 
3.0%
Other values (10) 583542
12.6%
Uppercase Letter
ValueCountFrequency (%)
C 156569
29.4%
A 104600
19.7%
B 79330
14.9%
M 66964
12.6%
F 66099
12.4%
E 27115
 
5.1%
T 21386
 
4.0%
P 4292
 
0.8%
O 2549
 
0.5%
H 2348
 
0.4%
Other values (5) 414
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 5147692
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 872631
17.0%
o 702054
13.6%
r 631416
12.3%
h 394973
 
7.7%
d 356870
 
6.9%
t 305449
 
5.9%
i 250365
 
4.9%
p 194047
 
3.8%
c 186323
 
3.6%
C 156569
 
3.0%
Other values (25) 1096995
21.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5147692
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 872631
17.0%
o 702054
13.6%
r 631416
12.3%
h 394973
 
7.7%
d 356870
 
6.9%
t 305449
 
5.9%
i 250365
 
4.9%
p 194047
 
3.8%
c 186323
 
3.6%
C 156569
 
3.0%
Other values (25) 1096995
21.3%

class
Text

Missing 

Distinct92
Distinct (%)< 0.1%
Missing272566
Missing (%)37.6%
Memory size5.5 MiB
2025-01-08T16:24:18.112099image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length18
Median length15
Mean length9.989064969
Min length4

Characters and Unicode

Total characters4514478
Distinct characters42
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowMammalia
2nd rowCephalopoda
3rd rowGlobothalamea
4th rowElasmobranchii
5th rowRhynchonellata
ValueCountFrequency (%)
mammalia 59795
13.2%
globothalamea 42882
 
9.5%
rhynchonellata 39551
 
8.8%
aves 34584
 
7.7%
insecta 32733
 
7.2%
gastropoda 24245
 
5.4%
ostracoda 23481
 
5.2%
elasmobranchii 23303
 
5.2%
trilobita 22315
 
4.9%
bivalvia 22257
 
4.9%
Other values (82) 126796
28.1%
2025-01-08T16:24:18.279755image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 873045
19.3%
o 420982
 
9.3%
l 396601
 
8.8%
i 316058
 
7.0%
t 241147
 
5.3%
e 235006
 
5.2%
m 212254
 
4.7%
n 206140
 
4.6%
h 195566
 
4.3%
s 167736
 
3.7%
Other values (32) 1249943
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4062536
90.0%
Uppercase Letter 451942
 
10.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 873045
21.5%
o 420982
10.4%
l 396601
9.8%
i 316058
 
7.8%
t 241147
 
5.9%
e 235006
 
5.8%
m 212254
 
5.2%
n 206140
 
5.1%
h 195566
 
4.8%
s 167736
 
4.1%
Other values (13) 798001
19.6%
Uppercase Letter
ValueCountFrequency (%)
M 92592
20.5%
G 71598
15.8%
A 42619
9.4%
R 40038
8.9%
I 32733
 
7.2%
E 32684
 
7.2%
C 30885
 
6.8%
T 29606
 
6.6%
B 25001
 
5.5%
O 23622
 
5.2%
Other values (9) 30564
 
6.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 4514478
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 873045
19.3%
o 420982
 
9.3%
l 396601
 
8.8%
i 316058
 
7.0%
t 241147
 
5.3%
e 235006
 
5.2%
m 212254
 
4.7%
n 206140
 
4.6%
h 195566
 
4.3%
s 167736
 
3.7%
Other values (32) 1249943
27.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4514478
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 873045
19.3%
o 420982
 
9.3%
l 396601
 
8.8%
i 316058
 
7.0%
t 241147
 
5.3%
e 235006
 
5.2%
m 212254
 
4.7%
n 206140
 
4.6%
h 195566
 
4.3%
s 167736
 
3.7%
Other values (32) 1249943
27.7%

order
Text

Missing 

Distinct484
Distinct (%)0.1%
Missing369296
Missing (%)51.0%
Memory size5.5 MiB
2025-01-08T16:24:18.410645image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length17
Mean length11.06623369
Min length5

Characters and Unicode

Total characters3930859
Distinct characters50
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique41 ?
Unique (%)< 0.1%

Sample

1st rowArtiodactyla
2nd rowCeratitida
3rd rowRotaliida
4th rowLamniformes
5th rowProcellariiformes
ValueCountFrequency (%)
rotaliida 32460
 
9.1%
diptera 14185
 
4.0%
porocephalida 14086
 
4.0%
podocopida 12424
 
3.5%
lamniformes 11376
 
3.2%
cetacea 10382
 
2.9%
procellariiformes 9895
 
2.8%
artiodactyla 8981
 
2.5%
terebratulida 8715
 
2.5%
perissodactyla 7870
 
2.2%
Other values (474) 224838
63.3%
2025-01-08T16:24:18.619552image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 543173
13.8%
i 471633
12.0%
o 372619
 
9.5%
r 281531
 
7.2%
e 276408
 
7.0%
d 260645
 
6.6%
t 210238
 
5.3%
l 209185
 
5.3%
s 166314
 
4.2%
c 134203
 
3.4%
Other values (40) 1004910
25.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3575647
91.0%
Uppercase Letter 355212
 
9.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 543173
15.2%
i 471633
13.2%
o 372619
10.4%
r 281531
7.9%
e 276408
7.7%
d 260645
7.3%
t 210238
 
5.9%
l 209185
 
5.9%
s 166314
 
4.7%
c 134203
 
3.8%
Other values (16) 649698
18.2%
Uppercase Letter
ValueCountFrequency (%)
P 86599
24.4%
C 54230
15.3%
R 49477
13.9%
L 31460
 
8.9%
A 26066
 
7.3%
T 20328
 
5.7%
D 18011
 
5.1%
N 13824
 
3.9%
M 13070
 
3.7%
S 11063
 
3.1%
Other values (14) 31084
 
8.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 3930859
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 543173
13.8%
i 471633
12.0%
o 372619
 
9.5%
r 281531
 
7.2%
e 276408
 
7.0%
d 260645
 
6.6%
t 210238
 
5.3%
l 209185
 
5.3%
s 166314
 
4.2%
c 134203
 
3.4%
Other values (40) 1004910
25.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3930859
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 543173
13.8%
i 471633
12.0%
o 372619
 
9.5%
r 281531
 
7.2%
e 276408
 
7.0%
d 260645
 
6.6%
t 210238
 
5.3%
l 209185
 
5.3%
s 166314
 
4.2%
c 134203
 
3.4%
Other values (40) 1004910
25.6%

family
Text

Missing 

Distinct4830
Distinct (%)1.0%
Missing258765
Missing (%)35.7%
Memory size5.5 MiB
2025-01-08T16:24:18.772002image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length25
Median length21
Mean length12.53716749
Min length5

Characters and Unicode

Total characters5839098
Distinct characters52
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique637 ?
Unique (%)0.1%

Sample

1st rowBovidae
2nd rowAcrochordiceratidae
3rd rowOrbitoclypeidae
4th rowOdontaspididae
5th rowEnteletidae
ValueCountFrequency (%)
subtriquetridae 14086
 
3.0%
milichiidae 13693
 
2.9%
procellariidae 9409
 
2.0%
lamnidae 7013
 
1.5%
carcharhinidae 5646
 
1.2%
anatidae 5251
 
1.1%
phocidae 4763
 
1.0%
vaginulinidae 3864
 
0.8%
equidae 3840
 
0.8%
physeteridae 3794
 
0.8%
Other values (4820) 394384
84.7%
2025-01-08T16:24:18.995461image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 849890
14.6%
e 786222
13.5%
a 738375
12.6%
d 534378
9.2%
r 330492
 
5.7%
o 326737
 
5.6%
l 281417
 
4.8%
t 274019
 
4.7%
n 228653
 
3.9%
c 216586
 
3.7%
Other values (42) 1272329
21.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5373355
92.0%
Uppercase Letter 465743
 
8.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 849890
15.8%
e 786222
14.6%
a 738375
13.7%
d 534378
9.9%
r 330492
 
6.2%
o 326737
 
6.1%
l 281417
 
5.2%
t 274019
 
5.1%
n 228653
 
4.3%
c 216586
 
4.0%
Other values (16) 806586
15.0%
Uppercase Letter
ValueCountFrequency (%)
P 63107
13.5%
C 50262
10.8%
S 46268
9.9%
M 38138
 
8.2%
A 35135
 
7.5%
L 27292
 
5.9%
T 27199
 
5.8%
H 23746
 
5.1%
E 21742
 
4.7%
B 20654
 
4.4%
Other values (16) 112200
24.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 5839098
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 849890
14.6%
e 786222
13.5%
a 738375
12.6%
d 534378
9.2%
r 330492
 
5.7%
o 326737
 
5.6%
l 281417
 
4.8%
t 274019
 
4.7%
n 228653
 
3.9%
c 216586
 
3.7%
Other values (42) 1272329
21.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5839098
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 849890
14.6%
e 786222
13.5%
a 738375
12.6%
d 534378
9.2%
r 330492
 
5.7%
o 326737
 
5.6%
l 281417
 
4.8%
t 274019
 
4.7%
n 228653
 
3.9%
c 216586
 
3.7%
Other values (42) 1272329
21.8%

genus
Text

Missing 

Distinct20048
Distinct (%)4.2%
Missing245070
Missing (%)33.8%
Memory size5.5 MiB
2025-01-08T16:24:19.193735image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length23
Median length20
Mean length10.1276432
Min length3

Characters and Unicode

Total characters4855577
Distinct characters53
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4473 ?
Unique (%)0.9%

Sample

1st rowDamaliscus
2nd rowAcrochordiceras
3rd rowAsterocyclina
4th rowCarcharias
5th rowEnteletes
ValueCountFrequency (%)
genus 13850
 
2.9%
marrella 12281
 
2.6%
pterodroma 6789
 
1.4%
callophoca 3770
 
0.8%
physeterula 3029
 
0.6%
carcharhinus 2974
 
0.6%
australca 2250
 
0.5%
thambetochen 2208
 
0.5%
hustedia 2080
 
0.4%
branta 2051
 
0.4%
Other values (20038) 428156
89.3%
2025-01-08T16:24:19.459772image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 524027
 
10.8%
i 409959
 
8.4%
o 399283
 
8.2%
e 377679
 
7.8%
r 355924
 
7.3%
s 324654
 
6.7%
l 308448
 
6.4%
n 254099
 
5.2%
t 240655
 
5.0%
u 219806
 
4.5%
Other values (43) 1441043
29.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4376122
90.1%
Uppercase Letter 479438
 
9.9%
Dash Punctuation 17
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 524027
12.0%
i 409959
9.4%
o 399283
9.1%
e 377679
 
8.6%
r 355924
 
8.1%
s 324654
 
7.4%
l 308448
 
7.0%
n 254099
 
5.8%
t 240655
 
5.5%
u 219806
 
5.0%
Other values (16) 961588
22.0%
Uppercase Letter
ValueCountFrequency (%)
P 67119
14.0%
C 57423
12.0%
M 39323
 
8.2%
A 37322
 
7.8%
S 33234
 
6.9%
G 31949
 
6.7%
H 25258
 
5.3%
T 25070
 
5.2%
B 24416
 
5.1%
L 23028
 
4.8%
Other values (16) 115296
24.0%
Dash Punctuation
ValueCountFrequency (%)
- 17
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4855560
> 99.9%
Common 17
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 524027
 
10.8%
i 409959
 
8.4%
o 399283
 
8.2%
e 377679
 
7.8%
r 355924
 
7.3%
s 324654
 
6.7%
l 308448
 
6.4%
n 254099
 
5.2%
t 240655
 
5.0%
u 219806
 
4.5%
Other values (42) 1441026
29.7%
Common
ValueCountFrequency (%)
- 17
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4855577
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 524027
 
10.8%
i 409959
 
8.4%
o 399283
 
8.2%
e 377679
 
7.8%
r 355924
 
7.3%
s 324654
 
6.7%
l 308448
 
6.4%
n 254099
 
5.2%
t 240655
 
5.0%
u 219806
 
4.5%
Other values (43) 1441043
29.7%

genericName
Text

Missing 

Distinct19254
Distinct (%)4.0%
Missing244897
Missing (%)33.8%
Memory size5.5 MiB
2025-01-08T16:24:19.666009image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length22
Median length19
Mean length10.00970995
Min length3

Characters and Unicode

Total characters4800767
Distinct characters55
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4453 ?
Unique (%)0.9%

Sample

1st rowDamaliscus
2nd rowAcrochordiceras
3rd rowDiscocyclina
4th rowOdontaspis
5th rowEnteletes
ValueCountFrequency (%)
genus 13850
 
2.9%
marrella 12281
 
2.6%
pterodroma 7305
 
1.5%
callophoca 3770
 
0.8%
isurus 3463
 
0.7%
physeterula 3029
 
0.6%
carcharhinus 2930
 
0.6%
australca 2250
 
0.5%
thambetochen 2208
 
0.5%
hustedia 2082
 
0.4%
Other values (19244) 426443
88.9%
2025-01-08T16:24:19.945936image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 519031
 
10.8%
i 403892
 
8.4%
o 387257
 
8.1%
e 374510
 
7.8%
r 356780
 
7.4%
s 320239
 
6.7%
l 307792
 
6.4%
n 251588
 
5.2%
t 236485
 
4.9%
u 219057
 
4.6%
Other values (45) 1424136
29.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4321139
90.0%
Uppercase Letter 479611
 
10.0%
Dash Punctuation 17
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 519031
12.0%
i 403892
9.3%
o 387257
9.0%
e 374510
 
8.7%
r 356780
 
8.3%
s 320239
 
7.4%
l 307792
 
7.1%
n 251588
 
5.8%
t 236485
 
5.5%
u 219057
 
5.1%
Other values (18) 944508
21.9%
Uppercase Letter
ValueCountFrequency (%)
P 64637
13.5%
C 57713
12.0%
M 37998
 
7.9%
A 37686
 
7.9%
G 33907
 
7.1%
S 33774
 
7.0%
H 25624
 
5.3%
T 25159
 
5.2%
B 24452
 
5.1%
L 22457
 
4.7%
Other values (16) 116204
24.2%
Dash Punctuation
ValueCountFrequency (%)
- 17
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4800750
> 99.9%
Common 17
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 519031
 
10.8%
i 403892
 
8.4%
o 387257
 
8.1%
e 374510
 
7.8%
r 356780
 
7.4%
s 320239
 
6.7%
l 307792
 
6.4%
n 251588
 
5.2%
t 236485
 
4.9%
u 219057
 
4.6%
Other values (44) 1424119
29.7%
Common
ValueCountFrequency (%)
- 17
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4800593
> 99.9%
None 174
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 519031
 
10.8%
i 403892
 
8.4%
o 387257
 
8.1%
e 374510
 
7.8%
r 356780
 
7.4%
s 320239
 
6.7%
l 307792
 
6.4%
n 251588
 
5.2%
t 236485
 
4.9%
u 219057
 
4.6%
Other values (43) 1423962
29.7%
None
ValueCountFrequency (%)
ë 164
94.3%
ö 10
 
5.7%

specificEpithet
Text

Missing 

Distinct21987
Distinct (%)8.0%
Missing449718
Missing (%)62.1%
Memory size5.5 MiB
2025-01-08T16:24:20.151235image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length17
Mean length8.738418429
Min length2

Characters and Unicode

Total characters2401230
Distinct characters29
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6275 ?
Unique (%)2.3%

Sample

1st rowlunatus
2nd rowhyatti
3rd rowsculpturata
4th rowcuspidata
5th rowrotundobesus
ValueCountFrequency (%)
phaeopygia 3232
 
1.2%
alba 2027
 
0.7%
megalodon 1648
 
0.6%
confluens 1438
 
0.5%
obscura 1243
 
0.5%
cahow 1050
 
0.4%
hastalis 917
 
0.3%
socialis 884
 
0.3%
varians 883
 
0.3%
paulus 879
 
0.3%
Other values (21977) 260589
94.8%
2025-01-08T16:24:20.427585image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 304358
12.7%
i 265754
11.1%
s 232914
9.7%
e 187242
 
7.8%
n 166271
 
6.9%
r 155825
 
6.5%
u 136921
 
5.7%
o 136782
 
5.7%
l 131891
 
5.5%
t 131533
 
5.5%
Other values (19) 551739
23.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2401217
> 99.9%
Dash Punctuation 13
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 304358
12.7%
i 265754
11.1%
s 232914
9.7%
e 187242
 
7.8%
n 166271
 
6.9%
r 155825
 
6.5%
u 136921
 
5.7%
o 136782
 
5.7%
l 131891
 
5.5%
t 131533
 
5.5%
Other values (18) 551726
23.0%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2401217
> 99.9%
Common 13
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 304358
12.7%
i 265754
11.1%
s 232914
9.7%
e 187242
 
7.8%
n 166271
 
6.9%
r 155825
 
6.5%
u 136921
 
5.7%
o 136782
 
5.7%
l 131891
 
5.5%
t 131533
 
5.5%
Other values (18) 551726
23.0%
Common
ValueCountFrequency (%)
- 13
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2401228
> 99.9%
None 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 304358
12.7%
i 265754
11.1%
s 232914
9.7%
e 187242
 
7.8%
n 166271
 
6.9%
r 155825
 
6.5%
u 136921
 
5.7%
o 136782
 
5.7%
l 131891
 
5.5%
t 131533
 
5.5%
Other values (17) 551737
23.0%
None
ValueCountFrequency (%)
ü 1
50.0%
ö 1
50.0%

infraspecificEpithet
Text

Missing 

Distinct1469
Distinct (%)23.3%
Missing718207
Missing (%)99.1%
Memory size5.5 MiB
2025-01-08T16:24:20.625697image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length18
Median length15
Mean length9.022536105
Min length2

Characters and Unicode

Total characters56851
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique540 ?
Unique (%)8.6%

Sample

1st rowcooperensis
2nd rowsubdecorata
3rd rowadvena
4th rowconvexa
5th rowpoloumera
ValueCountFrequency (%)
burchellii 494
 
7.8%
antarctica 104
 
1.7%
inflata 67
 
1.1%
vancouveriensis 64
 
1.0%
mexicana 54
 
0.9%
ornata 50
 
0.8%
caurina 42
 
0.7%
erectus 39
 
0.6%
texana 33
 
0.5%
curta 32
 
0.5%
Other values (1459) 5322
84.5%
2025-01-08T16:24:20.895537image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 8251
14.5%
i 6485
11.4%
s 4930
8.7%
e 4526
 
8.0%
n 4069
 
7.2%
t 3680
 
6.5%
r 3677
 
6.5%
l 3463
 
6.1%
c 3093
 
5.4%
u 2924
 
5.1%
Other values (16) 11753
20.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 56851
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 8251
14.5%
i 6485
11.4%
s 4930
8.7%
e 4526
 
8.0%
n 4069
 
7.2%
t 3680
 
6.5%
r 3677
 
6.5%
l 3463
 
6.1%
c 3093
 
5.4%
u 2924
 
5.1%
Other values (16) 11753
20.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 56851
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 8251
14.5%
i 6485
11.4%
s 4930
8.7%
e 4526
 
8.0%
n 4069
 
7.2%
t 3680
 
6.5%
r 3677
 
6.5%
l 3463
 
6.1%
c 3093
 
5.4%
u 2924
 
5.1%
Other values (16) 11753
20.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 56851
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 8251
14.5%
i 6485
11.4%
s 4930
8.7%
e 4526
 
8.0%
n 4069
 
7.2%
t 3680
 
6.5%
r 3677
 
6.5%
l 3463
 
6.1%
c 3093
 
5.4%
u 2924
 
5.1%
Other values (16) 11753
20.7%
Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:20.958950image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length10
Median length7
Mean length6.306741955
Min length4

Characters and Unicode

Total characters4569285
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowKINGDOM
2nd rowSPECIES
3rd rowSPECIES
4th rowSPECIES
5th rowSPECIES
ValueCountFrequency (%)
species 268489
37.1%
genus 204821
28.3%
kingdom 184360
25.4%
class 34827
 
4.8%
family 11500
 
1.6%
order 7792
 
1.1%
phylum 6418
 
0.9%
subspecies 3525
 
0.5%
variety 2760
 
0.4%
form 16
 
< 0.1%
2025-01-08T16:24:21.069263image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 822028
18.0%
E 759401
16.6%
I 470634
10.3%
G 389181
8.5%
N 389181
8.5%
C 306841
 
6.7%
P 278432
 
6.1%
U 214764
 
4.7%
M 202294
 
4.4%
O 192168
 
4.2%
Other values (11) 544361
11.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4569285
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 822028
18.0%
E 759401
16.6%
I 470634
10.3%
G 389181
8.5%
N 389181
8.5%
C 306841
 
6.7%
P 278432
 
6.1%
U 214764
 
4.7%
M 202294
 
4.4%
O 192168
 
4.2%
Other values (11) 544361
11.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 4569285
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 822028
18.0%
E 759401
16.6%
I 470634
10.3%
G 389181
8.5%
N 389181
8.5%
C 306841
 
6.7%
P 278432
 
6.1%
U 214764
 
4.7%
M 202294
 
4.4%
O 192168
 
4.2%
Other values (11) 544361
11.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4569285
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 822028
18.0%
E 759401
16.6%
I 470634
10.3%
G 389181
8.5%
N 389181
8.5%
C 306841
 
6.7%
P 278432
 
6.1%
U 214764
 
4.7%
M 202294
 
4.4%
O 192168
 
4.2%
Other values (11) 544361
11.9%

taxonomicStatus
Text

Missing 

Distinct3
Distinct (%)< 0.1%
Missing171789
Missing (%)23.7%
Memory size5.5 MiB
2025-01-08T16:24:21.117315image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length8
Mean length7.856598018
Min length7

Characters and Unicode

Total characters4342491
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowACCEPTED
2nd rowACCEPTED
3rd rowSYNONYM
4th rowSYNONYM
5th rowACCEPTED
ValueCountFrequency (%)
accepted 431194
78.0%
synonym 79261
 
14.3%
doubtful 42264
 
7.6%
2025-01-08T16:24:21.217843image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 862388
19.9%
E 862388
19.9%
T 473458
10.9%
D 473458
10.9%
A 431194
9.9%
P 431194
9.9%
Y 158522
 
3.7%
N 158522
 
3.7%
O 121525
 
2.8%
U 84528
 
1.9%
Other values (5) 285314
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4342491
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 862388
19.9%
E 862388
19.9%
T 473458
10.9%
D 473458
10.9%
A 431194
9.9%
P 431194
9.9%
Y 158522
 
3.7%
N 158522
 
3.7%
O 121525
 
2.8%
U 84528
 
1.9%
Other values (5) 285314
 
6.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 4342491
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 862388
19.9%
E 862388
19.9%
T 473458
10.9%
D 473458
10.9%
A 431194
9.9%
P 431194
9.9%
Y 158522
 
3.7%
N 158522
 
3.7%
O 121525
 
2.8%
U 84528
 
1.9%
Other values (5) 285314
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4342491
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 862388
19.9%
E 862388
19.9%
T 473458
10.9%
D 473458
10.9%
A 431194
9.9%
P 431194
9.9%
Y 158522
 
3.7%
N 158522
 
3.7%
O 121525
 
2.8%
U 84528
 
1.9%
Other values (5) 285314
 
6.6%

datasetKey
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:21.275824image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length36
Median length36
Mean length36
Min length36

Characters and Unicode

Total characters26082288
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowc8681cc2-9d0a-4c5f-b620-5c753abfe2bc
2nd rowc8681cc2-9d0a-4c5f-b620-5c753abfe2bc
3rd rowc8681cc2-9d0a-4c5f-b620-5c753abfe2bc
4th rowc8681cc2-9d0a-4c5f-b620-5c753abfe2bc
5th rowc8681cc2-9d0a-4c5f-b620-5c753abfe2bc
ValueCountFrequency (%)
c8681cc2-9d0a-4c5f-b620-5c753abfe2bc 724508
100.0%
2025-01-08T16:24:21.385502image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
c 4347048
16.7%
- 2898032
11.1%
b 2173524
8.3%
2 2173524
8.3%
5 2173524
8.3%
8 1449016
 
5.6%
f 1449016
 
5.6%
a 1449016
 
5.6%
0 1449016
 
5.6%
6 1449016
 
5.6%
Other values (7) 5071556
19.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 12316636
47.2%
Lowercase Letter 10867620
41.7%
Dash Punctuation 2898032
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 2173524
17.6%
5 2173524
17.6%
8 1449016
11.8%
0 1449016
11.8%
6 1449016
11.8%
4 724508
 
5.9%
9 724508
 
5.9%
1 724508
 
5.9%
7 724508
 
5.9%
3 724508
 
5.9%
Lowercase Letter
ValueCountFrequency (%)
c 4347048
40.0%
b 2173524
20.0%
f 1449016
 
13.3%
a 1449016
 
13.3%
d 724508
 
6.7%
e 724508
 
6.7%
Dash Punctuation
ValueCountFrequency (%)
- 2898032
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15214668
58.3%
Latin 10867620
41.7%

Most frequent character per script

Common
ValueCountFrequency (%)
- 2898032
19.0%
2 2173524
14.3%
5 2173524
14.3%
8 1449016
9.5%
0 1449016
9.5%
6 1449016
9.5%
4 724508
 
4.8%
9 724508
 
4.8%
1 724508
 
4.8%
7 724508
 
4.8%
Latin
ValueCountFrequency (%)
c 4347048
40.0%
b 2173524
20.0%
f 1449016
 
13.3%
a 1449016
 
13.3%
d 724508
 
6.7%
e 724508
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26082288
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c 4347048
16.7%
- 2898032
11.1%
b 2173524
8.3%
2 2173524
8.3%
5 2173524
8.3%
8 1449016
 
5.6%
f 1449016
 
5.6%
a 1449016
 
5.6%
0 1449016
 
5.6%
6 1449016
 
5.6%
Other values (7) 5071556
19.4%

publishingCountry
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:21.427190image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1449016
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUS
2nd rowUS
3rd rowUS
4th rowUS
5th rowUS
ValueCountFrequency (%)
us 724508
100.0%
2025-01-08T16:24:21.523418image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
U 724508
50.0%
S 724508
50.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1449016
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U 724508
50.0%
S 724508
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1449016
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
U 724508
50.0%
S 724508
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1449016
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U 724508
50.0%
S 724508
50.0%
Distinct37858
Distinct (%)5.2%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:21.626359image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length24
Median length24
Mean length23.99520778
Min length20

Characters and Unicode

Total characters17384720
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique984 ?
Unique (%)0.1%

Sample

1st row2024-12-02T10:16:26.190Z
2nd row2024-12-02T10:16:26.321Z
3rd row2024-12-02T10:16:26.322Z
4th row2024-12-02T10:16:26.322Z
5th row2024-12-02T10:16:26.323Z
ValueCountFrequency (%)
2024-12-02t10:17:03.880z 100
 
< 0.1%
2024-12-02t10:17:08.512z 92
 
< 0.1%
2024-12-02t10:17:04.870z 87
 
< 0.1%
2024-12-02t10:17:05.654z 87
 
< 0.1%
2024-12-02t10:16:52.136z 85
 
< 0.1%
2024-12-02t10:16:59.768z 85
 
< 0.1%
2024-12-02t10:17:07.114z 85
 
< 0.1%
2024-12-02t10:16:58.778z 84
 
< 0.1%
2024-12-02t10:17:03.172z 84
 
< 0.1%
2024-12-02t10:17:08.976z 83
 
< 0.1%
Other values (37848) 723636
99.9%
2025-01-08T16:24:21.807494image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 3187397
18.3%
0 2663851
15.3%
1 2462647
14.2%
- 1449016
8.3%
: 1449016
8.3%
4 1185381
 
6.8%
6 810755
 
4.7%
T 724508
 
4.2%
Z 724508
 
4.2%
. 723640
 
4.2%
Other values (5) 2004001
11.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 12314032
70.8%
Other Punctuation 2172656
 
12.5%
Dash Punctuation 1449016
 
8.3%
Uppercase Letter 1449016
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 3187397
25.9%
0 2663851
21.6%
1 2462647
20.0%
4 1185381
 
9.6%
6 810755
 
6.6%
7 497480
 
4.0%
5 488259
 
4.0%
3 420464
 
3.4%
9 301334
 
2.4%
8 296464
 
2.4%
Other Punctuation
ValueCountFrequency (%)
: 1449016
66.7%
. 723640
33.3%
Uppercase Letter
ValueCountFrequency (%)
T 724508
50.0%
Z 724508
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 1449016
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15935704
91.7%
Latin 1449016
 
8.3%

Most frequent character per script

Common
ValueCountFrequency (%)
2 3187397
20.0%
0 2663851
16.7%
1 2462647
15.5%
- 1449016
9.1%
: 1449016
9.1%
4 1185381
 
7.4%
6 810755
 
5.1%
. 723640
 
4.5%
7 497480
 
3.1%
5 488259
 
3.1%
Other values (3) 1018262
 
6.4%
Latin
ValueCountFrequency (%)
T 724508
50.0%
Z 724508
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17384720
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 3187397
18.3%
0 2663851
15.3%
1 2462647
14.2%
- 1449016
8.3%
: 1449016
8.3%
4 1185381
 
6.8%
6 810755
 
4.7%
T 724508
 
4.2%
Z 724508
 
4.2%
. 723640
 
4.2%
Other values (5) 2004001
11.5%

distanceFromCentroidInMeters
Real number (ℝ)

Missing 

Distinct149
Distinct (%)23.1%
Missing723864
Missing (%)99.9%
Infinite0
Infinite (%)0.0%
Mean2256.841767
Minimum0
Maximum4992.37105
Zeros6
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:21.887352image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile857.2535536
Q1857.2535536
median2818.630536
Q32818.630536
95-th percentile4618.527309
Maximum4992.37105
Range4992.37105
Interquartile range (IQR)1961.376982

Descriptive statistics

Standard deviation1312.822223
Coefficient of variation (CV)0.5817076951
Kurtosis-1.028590401
Mean2256.841767
Median Absolute Deviation (MAD)1402.236206
Skewness0.30591539
Sum1453406.098
Variance1723502.188
MonotonicityNot monotonic
2025-01-08T16:24:21.952572image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
857.2535536 226
 
< 0.1%
2818.630536 202
 
< 0.1%
4618.527309 27
 
< 0.1%
3800.284004 12
 
< 0.1%
1824.519626 10
 
< 0.1%
0 6
 
< 0.1%
1543.140798 6
 
< 0.1%
4852.601363 5
 
< 0.1%
3114.471841 4
 
< 0.1%
3029.93085 3
 
< 0.1%
Other values (139) 143
 
< 0.1%
(Missing) 723864
99.9%
ValueCountFrequency (%)
0 6
< 0.1%
253.452652 1
 
< 0.1%
533.2556305 1
 
< 0.1%
599.6747027 1
 
< 0.1%
605.9334686 1
 
< 0.1%
ValueCountFrequency (%)
4992.37105 1
< 0.1%
4985.80659 1
< 0.1%
4984.258263 1
< 0.1%
4978.129443 1
< 0.1%
4968.052222 1
< 0.1%

issue
Text

Distinct154
Distinct (%)< 0.1%
Missing193
Missing (%)< 0.1%
Memory size5.5 MiB
2025-01-08T16:24:22.012158image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length186
Median length181
Mean length68.38031105
Min length17

Characters and Unicode

Total characters49528885
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)< 0.1%

Sample

1st rowOCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT;GEODETIC_DATUM_ASSUMED_WGS84;TAXON_MATCH_NONE
2nd rowOCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT
3rd rowOCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT
4th rowOCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT;CONTINENT_DERIVED_FROM_COUNTRY
5th rowOCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT
ValueCountFrequency (%)
occurrence_status_inferred_from_individual_count 288609
39.8%
occurrence_status_inferred_from_individual_count;taxon_match_higherrank 165166
22.8%
occurrence_status_inferred_from_individual_count;taxon_match_none 89011
 
12.3%
occurrence_status_inferred_from_individual_count;geodetic_datum_assumed_wgs84;taxon_match_none 34505
 
4.8%
occurrence_status_inferred_from_individual_count;continent_derived_from_country 25422
 
3.5%
occurrence_status_inferred_from_individual_count;geodetic_datum_assumed_wgs84;continent_coordinate_mismatch;taxon_match_none 15005
 
2.1%
occurrence_status_inferred_from_individual_count;recorded_date_mismatch 12501
 
1.7%
occurrence_status_inferred_from_individual_count;geodetic_datum_assumed_wgs84;geodetic_datum_invalid;taxon_match_none 11612
 
1.6%
occurrence_status_inferred_from_individual_count;taxon_match_fuzzy 10754
 
1.5%
occurrence_status_inferred_from_individual_count;continent_derived_from_country;taxon_match_higherrank 10043
 
1.4%
Other values (144) 61687
 
8.5%
2025-01-08T16:24:22.154418image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 5044685
10.2%
R 4292364
 
8.7%
N 4233244
 
8.5%
E 3964433
 
8.0%
C 3659582
 
7.4%
I 3552475
 
7.2%
T 3530938
 
7.1%
U 3207403
 
6.5%
O 3183427
 
6.4%
D 2832439
 
5.7%
Other values (18) 12027895
24.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 43643926
88.1%
Connector Punctuation 5044685
 
10.2%
Other Punctuation 633292
 
1.3%
Decimal Number 206982
 
0.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 4292364
9.8%
N 4233244
9.7%
E 3964433
9.1%
C 3659582
8.4%
I 3552475
8.1%
T 3530938
8.1%
U 3207403
 
7.3%
O 3183427
 
7.3%
D 2832439
 
6.5%
A 2762222
 
6.3%
Other values (14) 8425399
19.3%
Decimal Number
ValueCountFrequency (%)
8 103491
50.0%
4 103491
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 5044685
100.0%
Other Punctuation
ValueCountFrequency (%)
; 633292
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 43643926
88.1%
Common 5884959
 
11.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 4292364
9.8%
N 4233244
9.7%
E 3964433
9.1%
C 3659582
8.4%
I 3552475
8.1%
T 3530938
8.1%
U 3207403
 
7.3%
O 3183427
 
7.3%
D 2832439
 
6.5%
A 2762222
 
6.3%
Other values (14) 8425399
19.3%
Common
ValueCountFrequency (%)
_ 5044685
85.7%
; 633292
 
10.8%
8 103491
 
1.8%
4 103491
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 49528885
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 5044685
10.2%
R 4292364
 
8.7%
N 4233244
 
8.5%
E 3964433
 
8.0%
C 3659582
 
7.4%
I 3552475
 
7.2%
T 3530938
 
7.1%
U 3207403
 
6.5%
O 3183427
 
6.4%
D 2832439
 
5.7%
Other values (18) 12027895
24.3%

mediaType
Text

Missing 

Distinct58
Distinct (%)0.1%
Missing637882
Missing (%)88.0%
Memory size5.5 MiB
2025-01-08T16:24:22.213825image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length1110
Median length1099
Mean length20.60165539
Min length10

Characters and Unicode

Total characters1784639
Distinct characters10
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)< 0.1%

Sample

1st rowStillImage
2nd rowStillImage
3rd rowStillImage
4th rowStillImage
5th rowStillImage;StillImage
ValueCountFrequency (%)
stillimage 36835
42.5%
stillimage;stillimage 35396
40.9%
stillimage;stillimage;stillimage;stillimage 5461
 
6.3%
stillimage;stillimage;stillimage 5225
 
6.0%
stillimage;stillimage;stillimage;stillimage;stillimage 2625
 
3.0%
stillimage;stillimage;stillimage;stillimage;stillimage;stillimage 354
 
0.4%
stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage 145
 
0.2%
stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage 132
 
0.2%
stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage 79
 
0.1%
stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage;stillimage 74
 
0.1%
Other values (48) 300
 
0.3%
2025-01-08T16:24:22.455821image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
l 340230
19.1%
S 170115
9.5%
t 170115
9.5%
i 170115
9.5%
I 170115
9.5%
m 170115
9.5%
a 170115
9.5%
g 170115
9.5%
e 170115
9.5%
; 83489
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1360920
76.3%
Uppercase Letter 340230
 
19.1%
Other Punctuation 83489
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 340230
25.0%
t 170115
12.5%
i 170115
12.5%
m 170115
12.5%
a 170115
12.5%
g 170115
12.5%
e 170115
12.5%
Uppercase Letter
ValueCountFrequency (%)
S 170115
50.0%
I 170115
50.0%
Other Punctuation
ValueCountFrequency (%)
; 83489
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1701150
95.3%
Common 83489
 
4.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 340230
20.0%
S 170115
10.0%
t 170115
10.0%
i 170115
10.0%
I 170115
10.0%
m 170115
10.0%
a 170115
10.0%
g 170115
10.0%
e 170115
10.0%
Common
ValueCountFrequency (%)
; 83489
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1784639
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l 340230
19.1%
S 170115
9.5%
t 170115
9.5%
i 170115
9.5%
I 170115
9.5%
m 170115
9.5%
a 170115
9.5%
g 170115
9.5%
e 170115
9.5%
; 83489
 
4.7%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size707.7 KiB
False
620570 
True
103938 
ValueCountFrequency (%)
False 620570
85.7%
True 103938
 
14.3%
2025-01-08T16:24:22.524728image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

hasGeospatialIssues
Boolean

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size707.7 KiB
False
723170 
True
 
1338
ValueCountFrequency (%)
False 723170
99.8%
True 1338
 
0.2%
2025-01-08T16:24:22.572685image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

taxonKey
Real number (ℝ)

Zeros 

Distinct65365
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4156268.397
Minimum0
Maximum12387090
Zeros171789
Zeros (%)23.7%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:22.627442image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q16
median4806932
Q37794651
95-th percentile9255998
Maximum12387090
Range12387090
Interquartile range (IQR)7794645

Descriptive statistics

Standard deviation3563951.427
Coefficient of variation (CV)0.857488277
Kurtosis-1.280507827
Mean4156268.397
Median Absolute Deviation (MAD)3776216.5
Skewness0.1911040766
Sum3.011249704 × 1012
Variance1.270174977 × 1013
MonotonicityNot monotonic
2025-01-08T16:24:22.698250image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 171789
 
23.7%
216 16872
 
2.3%
8513230 13693
 
1.9%
4806028 12281
 
1.7%
6 11457
 
1.6%
359 4656
 
0.6%
44 4268
 
0.6%
729 3674
 
0.5%
353 3566
 
0.5%
2481460 3232
 
0.4%
Other values (65355) 479020
66.1%
ValueCountFrequency (%)
0 171789
23.7%
1 1114
 
0.2%
6 11457
 
1.6%
42 952
 
0.1%
43 51
 
< 0.1%
ValueCountFrequency (%)
12387090 1
 
< 0.1%
12385426 4
< 0.1%
12385220 2
 
< 0.1%
12383973 1
 
< 0.1%
12379591 6
< 0.1%

acceptedTaxonKey
Real number (ℝ)

Missing 

Distinct58335
Distinct (%)10.6%
Missing171789
Missing (%)23.7%
Infinite0
Infinite (%)0.0%
Mean5515085.25
Minimum1
Maximum12385426
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:22.765802image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile216
Q13249393
median4941659
Q38513230
95-th percentile9626241
Maximum12385426
Range12385425
Interquartile range (IQR)5263837

Descriptive statistics

Standard deviation3184869.125
Coefficient of variation (CV)0.5774832084
Kurtosis-0.8885403732
Mean5515085.25
Median Absolute Deviation (MAD)2688948
Skewness-0.1769613565
Sum3.048292404 × 1012
Variance1.014339134 × 1013
MonotonicityNot monotonic
2025-01-08T16:24:22.832882image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
216 16872
 
2.3%
8513230 13693
 
1.9%
4806028 12281
 
1.7%
6 11457
 
1.6%
359 4656
 
0.6%
44 4268
 
0.6%
729 3674
 
0.5%
353 3566
 
0.5%
2481460 3232
 
0.4%
4832444 3022
 
0.4%
Other values (58325) 475998
65.7%
(Missing) 171789
 
23.7%
ValueCountFrequency (%)
1 1114
 
0.2%
6 11457
1.6%
42 952
 
0.1%
43 51
 
< 0.1%
44 4268
 
0.6%
ValueCountFrequency (%)
12385426 4
 
< 0.1%
12385220 2
 
< 0.1%
12379591 6
 
< 0.1%
12362277 15
< 0.1%
12358726 5
 
< 0.1%

kingdomKey
Real number (ℝ)

Zeros 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.303661243
Minimum0
Maximum7
Zeros171929
Zeros (%)23.7%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:22.887939image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q31
95-th percentile6
Maximum7
Range7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.508442348
Coefficient of variation (CV)1.157081531
Kurtosis2.902387583
Mean1.303661243
Median Absolute Deviation (MAD)0
Skewness1.923133372
Sum944513
Variance2.275398317
MonotonicityNot monotonic
2025-01-08T16:24:22.934292image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1 446288
61.6%
0 171929
 
23.7%
4 69124
 
9.5%
6 36324
 
5.0%
3 502
 
0.1%
7 287
 
< 0.1%
5 54
 
< 0.1%
ValueCountFrequency (%)
0 171929
 
23.7%
1 446288
61.6%
3 502
 
0.1%
4 69124
 
9.5%
5 54
 
< 0.1%
ValueCountFrequency (%)
7 287
 
< 0.1%
6 36324
5.0%
5 54
 
< 0.1%
4 69124
9.5%
3 502
 
0.1%

phylumKey
Real number (ℝ)

Missing 

Distinct40
Distinct (%)< 0.1%
Missing192842
Missing (%)26.6%
Infinite0
Infinite (%)0.0%
Mean1373510.734
Minimum9
Maximum12228025
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:22.992745image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum9
5-th percentile44
Q144
median53
Q3110
95-th percentile8376456
Maximum12228025
Range12228016
Interquartile range (IQR)66

Descriptive statistics

Standard deviation3060656.412
Coefficient of variation (CV)2.228345463
Kurtosis1.208523107
Mean1373510.734
Median Absolute Deviation (MAD)9
Skewness1.786678435
Sum7.302489577 × 1011
Variance9.367617675 × 1012
MonotonicityNot monotonic
2025-01-08T16:24:23.062462image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
44 148527
20.5%
54 101505
14.0%
52 66708
 
9.2%
8376456 66099
 
9.1%
110 65633
 
9.1%
50 27100
 
3.7%
7707728 21340
 
2.9%
53 13677
 
1.9%
43 6914
 
1.0%
42 3027
 
0.4%
Other values (30) 11136
 
1.5%
(Missing) 192842
26.6%
ValueCountFrequency (%)
9 20
 
< 0.1%
14 46
 
< 0.1%
32 1
 
< 0.1%
33 225
< 0.1%
35 20
 
< 0.1%
ValueCountFrequency (%)
12228025 12
 
< 0.1%
9778081 1
 
< 0.1%
8770992 11
 
< 0.1%
8376456 66099
9.1%
8173593 15
 
< 0.1%

classKey
Real number (ℝ)

Missing 

Distinct92
Distinct (%)< 0.1%
Missing272566
Missing (%)37.6%
Infinite0
Infinite (%)0.0%
Mean1466432.75
Minimum116
Maximum12259753
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:23.129063image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum116
5-th percentile121
Q1210
median220
Q3359
95-th percentile9273948
Maximum12259753
Range12259637
Interquartile range (IQR)149

Descriptive statistics

Standard deviation3184973.092
Coefficient of variation (CV)2.17191896
Kurtosis1.344083486
Mean1466432.75
Median Absolute Deviation (MAD)83
Skewness1.775836497
Sum6.627425498 × 1011
Variance1.01440536 × 1013
MonotonicityNot monotonic
2025-01-08T16:24:23.200165image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
359 59795
 
8.3%
7434778 42882
 
5.9%
210 39551
 
5.5%
212 34584
 
4.8%
216 32733
 
4.5%
225 24245
 
3.3%
353 23481
 
3.2%
121 23303
 
3.2%
9273948 22315
 
3.1%
137 22257
 
3.1%
Other values (82) 126796
17.5%
(Missing) 272566
37.6%
ValueCountFrequency (%)
116 36
 
< 0.1%
120 659
 
0.1%
121 23303
3.2%
125 9
 
< 0.1%
126 11
 
< 0.1%
ValueCountFrequency (%)
12259753 1
 
< 0.1%
12203163 1
 
< 0.1%
12186859 12
 
< 0.1%
11733052 62
 
< 0.1%
11592253 1006
0.1%

orderKey
Real number (ℝ)

Missing 

Distinct484
Distinct (%)0.1%
Missing369296
Missing (%)51.0%
Infinite0
Infinite (%)0.0%
Mean3512590.676
Minimum370
Maximum12263124
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:23.269564image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum370
5-th percentile509
Q1798
median1436
Q37692889
95-th percentile11151631
Maximum12263124
Range12262754
Interquartile range (IQR)7692091

Descriptive statistics

Standard deviation4380254.677
Coefficient of variation (CV)1.247015403
Kurtosis-1.502677218
Mean3512590.676
Median Absolute Deviation (MAD)799
Skewness0.5410560688
Sum1.247714359 × 1012
Variance1.918663103 × 1013
MonotonicityNot monotonic
2025-01-08T16:24:23.341758image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7692889 32460
 
4.5%
811 14185
 
2.0%
1419 14086
 
1.9%
1438 12424
 
1.7%
885 11376
 
1.6%
733 10382
 
1.4%
7192755 9895
 
1.4%
731 8981
 
1.2%
509 8715
 
1.2%
795 7870
 
1.1%
Other values (474) 224838
31.0%
(Missing) 369296
51.0%
ValueCountFrequency (%)
370 300
 
< 0.1%
371 3664
0.5%
376 8
 
< 0.1%
381 5
 
< 0.1%
392 635
 
0.1%
ValueCountFrequency (%)
12263124 1
 
< 0.1%
12261528 2195
0.3%
12260364 11
 
< 0.1%
12244639 2
 
< 0.1%
12243044 2
 
< 0.1%

familyKey
Real number (ℝ)

Missing 

Distinct4832
Distinct (%)1.0%
Missing258765
Missing (%)35.7%
Infinite0
Infinite (%)0.0%
Mean3036480.23
Minimum1895
Maximum12262968
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:23.407805image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1895
5-th percentile2918
Q17086
median3252093
Q34834682
95-th percentile8052057
Maximum12262968
Range12261073
Interquartile range (IQR)4827596

Descriptive statistics

Standard deviation2821037.453
Coefficient of variation (CV)0.9290485166
Kurtosis-0.5522487965
Mean3036480.23
Median Absolute Deviation (MAD)3242579
Skewness0.508338039
Sum1.414219412 × 1012
Variance7.958252313 × 1012
MonotonicityNot monotonic
2025-01-08T16:24:23.478645image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3255384 14086
 
1.9%
9496 13693
 
1.9%
9339 9409
 
1.3%
5888 7013
 
1.0%
2211 5646
 
0.8%
2986 5251
 
0.7%
5310 4763
 
0.7%
7923659 3864
 
0.5%
5479 3840
 
0.5%
5446 3794
 
0.5%
Other values (4822) 394384
54.4%
(Missing) 258765
35.7%
ValueCountFrequency (%)
1895 12
< 0.1%
1897 3
 
< 0.1%
1978 20
< 0.1%
1989 12
< 0.1%
2006 29
< 0.1%
ValueCountFrequency (%)
12262968 4
 
< 0.1%
12247189 9
 
< 0.1%
12246268 3
 
< 0.1%
12236981 32
< 0.1%
12234980 3
 
< 0.1%

genusKey
Real number (ℝ)

Missing 

Distinct20311
Distinct (%)4.2%
Missing245070
Missing (%)33.8%
Infinite0
Infinite (%)0.0%
Mean4935876.249
Minimum1000424
Maximum12385426
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:23.545566image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1000424
5-th percentile2278992
Q13251308
median4830391
Q34897257
95-th percentile8513230
Maximum12385426
Range11385002
Interquartile range (IQR)1645949

Descriptive statistics

Standard deviation2083699.037
Coefficient of variation (CV)0.4221538248
Kurtosis0.07746136144
Mean4935876.249
Median Absolute Deviation (MAD)598535
Skewness0.610001942
Sum2.366446637 × 1012
Variance4.341801678 × 1012
MonotonicityNot monotonic
2025-01-08T16:24:23.620784image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8513230 13693
 
1.9%
4806028 12281
 
1.7%
2481443 6789
 
0.9%
4833150 3770
 
0.5%
4832444 3029
 
0.4%
2417963 2974
 
0.4%
4848792 2250
 
0.3%
4851051 2208
 
0.3%
4870176 2080
 
0.3%
2498190 2051
 
0.3%
Other values (20301) 428313
59.1%
(Missing) 245070
33.8%
ValueCountFrequency (%)
1000424 11
 
< 0.1%
1003585 4
 
< 0.1%
1003655 29
< 0.1%
1003657 2
 
< 0.1%
1003659 3
 
< 0.1%
ValueCountFrequency (%)
12385426 4
< 0.1%
12385220 2
 
< 0.1%
12384711 5
< 0.1%
12379591 6
< 0.1%
12378210 2
 
< 0.1%

speciesKey
Real number (ℝ)

Missing 

Distinct45066
Distinct (%)16.4%
Missing450165
Missing (%)62.1%
Infinite0
Infinite (%)0.0%
Mean7340362.403
Minimum1003615
Maximum12353765
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:23.696359image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1003615
5-th percentile2441022
Q14977965
median8423909
Q39037391.5
95-th percentile11127348
Maximum12353765
Range11350150
Interquartile range (IQR)4059426.5

Descriptive statistics

Standard deviation2477246.806
Coefficient of variation (CV)0.3374829021
Kurtosis-0.4744667152
Mean7340362.403
Median Absolute Deviation (MAD)1034777
Skewness-0.5664310113
Sum2.013777043 × 1012
Variance6.136751738 × 1012
MonotonicityNot monotonic
2025-01-08T16:24:23.766315image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2481460 3232
 
0.4%
2481469 1833
 
0.3%
9413495 1648
 
0.2%
8819428 1401
 
0.2%
4941659 1115
 
0.2%
2481465 1050
 
0.1%
5816525 1044
 
0.1%
5816410 917
 
0.1%
4874907 816
 
0.1%
12198857 814
 
0.1%
Other values (45056) 260473
36.0%
(Missing) 450165
62.1%
ValueCountFrequency (%)
1003615 2
< 0.1%
1003627 2
< 0.1%
1003667 1
< 0.1%
1003733 1
< 0.1%
1003829 1
< 0.1%
ValueCountFrequency (%)
12353765 1
 
< 0.1%
12326275 2
< 0.1%
12279081 4
< 0.1%
12266515 1
 
< 0.1%
12266463 2
< 0.1%

species
Text

Missing 

Distinct45045
Distinct (%)16.4%
Missing450165
Missing (%)62.1%
Memory size5.5 MiB
2025-01-08T16:24:23.965804image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length40
Median length36
Mean length19.97971153
Min length9

Characters and Unicode

Total characters5481294
Distinct characters54
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16582 ?
Unique (%)6.0%

Sample

1st rowDamaliscus lunatus
2nd rowAcrochordiceras hyatti
3rd rowAsterocyclina minima
4th rowCarcharias tricuspidatus
5th rowEnteletes rotundobesus
ValueCountFrequency (%)
pterodroma 6569
 
1.2%
phaeopygia 3232
 
0.6%
carcharias 2554
 
0.5%
hustedia 2069
 
0.4%
alba 2031
 
0.4%
oxyrhina 1714
 
0.3%
lepidocyclina 1710
 
0.3%
hyopsodus 1699
 
0.3%
megalodon 1650
 
0.3%
bolivina 1496
 
0.3%
Other values (34798) 523962
95.5%
2025-01-08T16:24:24.246277image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 606832
 
11.1%
i 523413
 
9.5%
s 407795
 
7.4%
e 395509
 
7.2%
o 376936
 
6.9%
r 355606
 
6.5%
n 311565
 
5.7%
l 303075
 
5.5%
274343
 
5.0%
t 273072
 
5.0%
Other values (44) 1653148
30.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4932605
90.0%
Space Separator 274343
 
5.0%
Uppercase Letter 274343
 
5.0%
Dash Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 606832
12.3%
i 523413
10.6%
s 407795
 
8.3%
e 395509
 
8.0%
o 376936
 
7.6%
r 355606
 
7.2%
n 311565
 
6.3%
l 303075
 
6.1%
t 273072
 
5.5%
u 258946
 
5.2%
Other values (16) 1119856
22.7%
Uppercase Letter
ValueCountFrequency (%)
P 40367
14.7%
C 33494
12.2%
A 21018
 
7.7%
S 18923
 
6.9%
M 18131
 
6.6%
T 14851
 
5.4%
H 14776
 
5.4%
L 14523
 
5.3%
E 13423
 
4.9%
B 13044
 
4.8%
Other values (16) 71793
26.2%
Space Separator
ValueCountFrequency (%)
274343
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5206948
95.0%
Common 274346
 
5.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 606832
11.7%
i 523413
 
10.1%
s 407795
 
7.8%
e 395509
 
7.6%
o 376936
 
7.2%
r 355606
 
6.8%
n 311565
 
6.0%
l 303075
 
5.8%
t 273072
 
5.2%
u 258946
 
5.0%
Other values (42) 1394199
26.8%
Common
ValueCountFrequency (%)
274343
> 99.9%
- 3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5481294
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 606832
 
11.1%
i 523413
 
9.5%
s 407795
 
7.4%
e 395509
 
7.2%
o 376936
 
6.9%
r 355606
 
6.5%
n 311565
 
5.7%
l 303075
 
5.5%
274343
 
5.0%
t 273072
 
5.0%
Other values (44) 1653148
30.2%
Distinct58335
Distinct (%)10.6%
Missing171789
Missing (%)23.7%
Memory size5.5 MiB
2025-01-08T16:24:24.457487image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length124
Median length80
Mean length28.18863111
Min length4

Characters and Unicode

Total characters15580392
Distinct characters109
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20407 ?
Unique (%)3.7%

Sample

1st rowDamaliscus lunatus (Burchell, 1823)
2nd rowAcrochordiceras hyatti Meek, 1877
3rd rowAsterocyclina minima (Cushman, 1918)
4th rowCarcharias tricuspidatus Day, 1878
5th rowEnteletes rotundobesus Cooper & Grant, 1976
ValueCountFrequency (%)
80122
 
4.1%
walcott 31024
 
1.6%
cooper 23991
 
1.2%
insecta 16885
 
0.9%
1912 16538
 
0.8%
cushman 16371
 
0.8%
grant 16172
 
0.8%
1976 14710
 
0.7%
genus 13850
 
0.7%
js 13693
 
0.7%
Other values (46962) 1721141
87.6%
2025-01-08T16:24:24.743402image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1411778
 
9.1%
a 1238433
 
7.9%
e 968772
 
6.2%
i 903533
 
5.8%
o 818690
 
5.3%
r 805467
 
5.2%
s 768080
 
4.9%
n 717605
 
4.6%
l 693504
 
4.5%
t 605528
 
3.9%
Other values (99) 6649002
42.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10325962
66.3%
Decimal Number 1830048
 
11.7%
Space Separator 1411778
 
9.1%
Uppercase Letter 1180045
 
7.6%
Other Punctuation 582213
 
3.7%
Open Punctuation 123706
 
0.8%
Close Punctuation 123706
 
0.8%
Dash Punctuation 2931
 
< 0.1%
Math Symbol 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1238433
12.0%
e 968772
9.4%
i 903533
 
8.8%
o 818690
 
7.9%
r 805467
 
7.8%
s 768080
 
7.4%
n 717605
 
6.9%
l 693504
 
6.7%
t 605528
 
5.9%
u 452103
 
4.4%
Other values (47) 2354247
22.8%
Uppercase Letter
ValueCountFrequency (%)
C 146305
12.4%
P 101665
 
8.6%
S 99149
 
8.4%
B 93739
 
7.9%
M 80929
 
6.9%
G 77205
 
6.5%
L 66062
 
5.6%
W 66034
 
5.6%
A 64513
 
5.5%
H 57879
 
4.9%
Other values (22) 326565
27.7%
Decimal Number
ValueCountFrequency (%)
1 540277
29.5%
9 307736
16.8%
8 288960
15.8%
7 138348
 
7.6%
6 114061
 
6.2%
5 103286
 
5.6%
2 99635
 
5.4%
3 90203
 
4.9%
0 73911
 
4.0%
4 73631
 
4.0%
Other Punctuation
ValueCountFrequency (%)
, 461741
79.3%
& 80122
 
13.8%
. 32324
 
5.6%
' 8024
 
1.4%
? 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
1411778
100.0%
Open Punctuation
ValueCountFrequency (%)
( 123706
100.0%
Close Punctuation
ValueCountFrequency (%)
) 123706
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2931
100.0%
Math Symbol
ValueCountFrequency (%)
× 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11506007
73.8%
Common 4074385
 
26.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1238433
 
10.8%
e 968772
 
8.4%
i 903533
 
7.9%
o 818690
 
7.1%
r 805467
 
7.0%
s 768080
 
6.7%
n 717605
 
6.2%
l 693504
 
6.0%
t 605528
 
5.3%
u 452103
 
3.9%
Other values (79) 3534292
30.7%
Common
ValueCountFrequency (%)
1411778
34.7%
1 540277
 
13.3%
, 461741
 
11.3%
9 307736
 
7.6%
8 288960
 
7.1%
7 138348
 
3.4%
( 123706
 
3.0%
) 123706
 
3.0%
6 114061
 
2.8%
5 103286
 
2.5%
Other values (10) 460786
 
11.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15566857
99.9%
None 13535
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1411778
 
9.1%
a 1238433
 
8.0%
e 968772
 
6.2%
i 903533
 
5.8%
o 818690
 
5.3%
r 805467
 
5.2%
s 768080
 
4.9%
n 717605
 
4.6%
l 693504
 
4.5%
t 605528
 
3.9%
Other values (61) 6635467
42.6%
None
ValueCountFrequency (%)
ü 3598
26.6%
ö 2698
19.9%
é 2171
16.0%
è 2104
15.5%
ú 1665
12.3%
ã 293
 
2.2%
ž 160
 
1.2%
ä 147
 
1.1%
å 122
 
0.9%
ë 98
 
0.7%
Other values (28) 479
 
3.5%
Distinct97401
Distinct (%)17.6%
Missing171332
Missing (%)23.6%
Memory size5.5 MiB
2025-01-08T16:24:24.951529image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length62
Median length56
Mean length18.07695742
Min length5

Characters and Unicode

Total characters9999739
Distinct characters72
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44766 ?
Unique (%)8.1%

Sample

1st rowDamaliscus lunatus
2nd rowAcrochordiceras hyatti
3rd rowDiscocyclina (Asterocyclina) sculpturata
4th rowOdontaspis cuspidata
5th rowEnteletes rotundobesus
ValueCountFrequency (%)
sp 136960
 
12.1%
genus 56232
 
5.0%
insecta 16851
 
1.5%
splendens 12400
 
1.1%
marrella 12281
 
1.1%
pterodroma 7305
 
0.6%
var 6498
 
0.6%
callophoca 3770
 
0.3%
isurus 3463
 
0.3%
ostracoda 3391
 
0.3%
Other values (53913) 873954
77.1%
2025-01-08T16:24:25.247216image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 1021294
 
10.2%
s 909134
 
9.1%
i 819278
 
8.2%
e 762530
 
7.6%
o 610330
 
6.1%
r 609311
 
6.1%
n 592254
 
5.9%
579929
 
5.8%
l 537519
 
5.4%
u 466436
 
4.7%
Other values (62) 3091724
30.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 8787040
87.9%
Space Separator 579929
 
5.8%
Uppercase Letter 575487
 
5.8%
Close Punctuation 22326
 
0.2%
Open Punctuation 22314
 
0.2%
Other Punctuation 10186
 
0.1%
Decimal Number 1938
 
< 0.1%
Dash Punctuation 518
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1021294
11.6%
s 909134
10.3%
i 819278
9.3%
e 762530
 
8.7%
o 610330
 
6.9%
r 609311
 
6.9%
n 592254
 
6.7%
l 537519
 
6.1%
u 466436
 
5.3%
t 465047
 
5.3%
Other values (16) 1993907
22.7%
Uppercase Letter
ValueCountFrequency (%)
G 79813
13.9%
P 69195
12.0%
C 60147
10.5%
A 39927
 
6.9%
M 39806
 
6.9%
S 35677
 
6.2%
B 27831
 
4.8%
H 26616
 
4.6%
T 26590
 
4.6%
I 25413
 
4.4%
Other values (16) 144472
25.1%
Decimal Number
ValueCountFrequency (%)
1 962
49.6%
2 543
28.0%
3 206
 
10.6%
4 92
 
4.7%
5 67
 
3.5%
6 38
 
2.0%
7 19
 
1.0%
8 5
 
0.3%
0 4
 
0.2%
9 2
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 10146
99.6%
' 21
 
0.2%
? 13
 
0.1%
* 5
 
< 0.1%
# 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
579929
100.0%
Close Punctuation
ValueCountFrequency (%)
) 22326
100.0%
Open Punctuation
ValueCountFrequency (%)
( 22314
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 518
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9362527
93.6%
Common 637212
 
6.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1021294
 
10.9%
s 909134
 
9.7%
i 819278
 
8.8%
e 762530
 
8.1%
o 610330
 
6.5%
r 609311
 
6.5%
n 592254
 
6.3%
l 537519
 
5.7%
u 466436
 
5.0%
t 465047
 
5.0%
Other values (42) 2569394
27.4%
Common
ValueCountFrequency (%)
579929
91.0%
) 22326
 
3.5%
( 22314
 
3.5%
. 10146
 
1.6%
1 962
 
0.2%
2 543
 
0.1%
- 518
 
0.1%
3 206
 
< 0.1%
4 92
 
< 0.1%
5 67
 
< 0.1%
Other values (10) 109
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9999739
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 1021294
 
10.2%
s 909134
 
9.1%
i 819278
 
8.2%
e 762530
 
7.6%
o 610330
 
6.1%
r 609311
 
6.1%
n 592254
 
5.9%
579929
 
5.8%
l 537519
 
5.4%
u 466436
 
4.7%
Other values (62) 3091724
30.9%

typifiedName
Text

Constant  Missing 

Distinct1
Distinct (%)14.3%
Missing724501
Missing (%)> 99.9%
Memory size5.5 MiB
2025-01-08T16:24:25.302308image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters28
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowType
2nd rowType
3rd rowType
4th rowType
5th rowType
ValueCountFrequency (%)
type 7
100.0%
2025-01-08T16:24:25.507149image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
T 7
25.0%
y 7
25.0%
p 7
25.0%
e 7
25.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 21
75.0%
Uppercase Letter 7
 
25.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
y 7
33.3%
p 7
33.3%
e 7
33.3%
Uppercase Letter
ValueCountFrequency (%)
T 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 28
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 7
25.0%
y 7
25.0%
p 7
25.0%
e 7
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 28
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 7
25.0%
y 7
25.0%
p 7
25.0%
e 7
25.0%

protocol
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:25.548787image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2173524
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEML
2nd rowEML
3rd rowEML
4th rowEML
5th rowEML
ValueCountFrequency (%)
eml 724508
100.0%
2025-01-08T16:24:25.645796image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 724508
33.3%
M 724508
33.3%
L 724508
33.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2173524
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 724508
33.3%
M 724508
33.3%
L 724508
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 2173524
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 724508
33.3%
M 724508
33.3%
L 724508
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2173524
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 724508
33.3%
M 724508
33.3%
L 724508
33.3%
Distinct37858
Distinct (%)5.2%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:25.751401image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length24
Median length24
Mean length23.99520778
Min length20

Characters and Unicode

Total characters17384720
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique984 ?
Unique (%)0.1%

Sample

1st row2024-12-02T10:16:26.190Z
2nd row2024-12-02T10:16:26.321Z
3rd row2024-12-02T10:16:26.322Z
4th row2024-12-02T10:16:26.322Z
5th row2024-12-02T10:16:26.323Z
ValueCountFrequency (%)
2024-12-02t10:17:03.880z 100
 
< 0.1%
2024-12-02t10:17:08.512z 92
 
< 0.1%
2024-12-02t10:17:04.870z 87
 
< 0.1%
2024-12-02t10:17:05.654z 87
 
< 0.1%
2024-12-02t10:16:52.136z 85
 
< 0.1%
2024-12-02t10:16:59.768z 85
 
< 0.1%
2024-12-02t10:17:07.114z 85
 
< 0.1%
2024-12-02t10:16:58.778z 84
 
< 0.1%
2024-12-02t10:17:03.172z 84
 
< 0.1%
2024-12-02t10:17:08.976z 83
 
< 0.1%
Other values (37848) 723636
99.9%
2025-01-08T16:24:25.934273image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 3187397
18.3%
0 2663851
15.3%
1 2462647
14.2%
- 1449016
8.3%
: 1449016
8.3%
4 1185381
 
6.8%
6 810755
 
4.7%
T 724508
 
4.2%
Z 724508
 
4.2%
. 723640
 
4.2%
Other values (5) 2004001
11.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 12314032
70.8%
Other Punctuation 2172656
 
12.5%
Dash Punctuation 1449016
 
8.3%
Uppercase Letter 1449016
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 3187397
25.9%
0 2663851
21.6%
1 2462647
20.0%
4 1185381
 
9.6%
6 810755
 
6.6%
7 497480
 
4.0%
5 488259
 
4.0%
3 420464
 
3.4%
9 301334
 
2.4%
8 296464
 
2.4%
Other Punctuation
ValueCountFrequency (%)
: 1449016
66.7%
. 723640
33.3%
Uppercase Letter
ValueCountFrequency (%)
T 724508
50.0%
Z 724508
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 1449016
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15935704
91.7%
Latin 1449016
 
8.3%

Most frequent character per script

Common
ValueCountFrequency (%)
2 3187397
20.0%
0 2663851
16.7%
1 2462647
15.5%
- 1449016
9.1%
: 1449016
9.1%
4 1185381
 
7.4%
6 810755
 
5.1%
. 723640
 
4.5%
7 497480
 
3.1%
5 488259
 
3.1%
Other values (3) 1018262
 
6.4%
Latin
ValueCountFrequency (%)
T 724508
50.0%
Z 724508
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17384720
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 3187397
18.3%
0 2663851
15.3%
1 2462647
14.2%
- 1449016
8.3%
: 1449016
8.3%
4 1185381
 
6.8%
6 810755
 
4.7%
T 724508
 
4.2%
Z 724508
 
4.2%
. 723640
 
4.2%
Other values (5) 2004001
11.5%

lastCrawled
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:25.998170image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length24
Median length24
Mean length24
Min length24

Characters and Unicode

Total characters17388192
Distinct characters11
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2024-12-02T10:02:33.848Z
2nd row2024-12-02T10:02:33.848Z
3rd row2024-12-02T10:02:33.848Z
4th row2024-12-02T10:02:33.848Z
5th row2024-12-02T10:02:33.848Z
ValueCountFrequency (%)
2024-12-02t10:02:33.848z 724508
100.0%
2025-01-08T16:24:26.107265image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 3622540
20.8%
0 2898032
16.7%
4 1449016
 
8.3%
- 1449016
 
8.3%
1 1449016
 
8.3%
: 1449016
 
8.3%
3 1449016
 
8.3%
8 1449016
 
8.3%
T 724508
 
4.2%
. 724508
 
4.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 12316636
70.8%
Other Punctuation 2173524
 
12.5%
Dash Punctuation 1449016
 
8.3%
Uppercase Letter 1449016
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 3622540
29.4%
0 2898032
23.5%
4 1449016
 
11.8%
1 1449016
 
11.8%
3 1449016
 
11.8%
8 1449016
 
11.8%
Other Punctuation
ValueCountFrequency (%)
: 1449016
66.7%
. 724508
33.3%
Uppercase Letter
ValueCountFrequency (%)
T 724508
50.0%
Z 724508
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 1449016
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15939176
91.7%
Latin 1449016
 
8.3%

Most frequent character per script

Common
ValueCountFrequency (%)
2 3622540
22.7%
0 2898032
18.2%
4 1449016
 
9.1%
- 1449016
 
9.1%
1 1449016
 
9.1%
: 1449016
 
9.1%
3 1449016
 
9.1%
8 1449016
 
9.1%
. 724508
 
4.5%
Latin
ValueCountFrequency (%)
T 724508
50.0%
Z 724508
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17388192
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 3622540
20.8%
0 2898032
16.7%
4 1449016
 
8.3%
- 1449016
 
8.3%
1 1449016
 
8.3%
: 1449016
 
8.3%
3 1449016
 
8.3%
8 1449016
 
8.3%
T 724508
 
4.2%
. 724508
 
4.2%

repatriated
Boolean

Missing 

Distinct2
Distinct (%)< 0.1%
Missing158317
Missing (%)21.9%
Memory size5.5 MiB
False
428942 
True
137249 
(Missing)
158317 
ValueCountFrequency (%)
False 428942
59.2%
True 137249
 
18.9%
(Missing) 158317
 
21.9%
2025-01-08T16:24:26.164385image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

isSequenced
Boolean

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size707.7 KiB
False
724508 
ValueCountFrequency (%)
False 724508
100.0%
2025-01-08T16:24:26.204516image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

gbifRegion
Text

Missing 

Distinct7
Distinct (%)< 0.1%
Missing160612
Missing (%)22.2%
Memory size5.5 MiB
2025-01-08T16:24:26.241836image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length13
Mean length12.4128545
Min length4

Characters and Unicode

Total characters6999559
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNORTH_AMERICA
2nd rowAFRICA
3rd rowNORTH_AMERICA
4th rowLATIN_AMERICA
5th rowNORTH_AMERICA
ValueCountFrequency (%)
north_america 468544
83.1%
latin_america 47663
 
8.5%
europe 16154
 
2.9%
asia 10382
 
1.8%
oceania 9334
 
1.7%
africa 8278
 
1.5%
antarctica 3541
 
0.6%
2025-01-08T16:24:26.348410image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 1146688
16.4%
R 1012724
14.5%
I 595405
8.5%
E 557849
8.0%
C 540901
7.7%
N 529082
7.6%
T 523289
7.5%
_ 516207
7.4%
M 516207
7.4%
O 494032
7.1%
Other values (6) 567175
8.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 6483352
92.6%
Connector Punctuation 516207
 
7.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 1146688
17.7%
R 1012724
15.6%
I 595405
9.2%
E 557849
8.6%
C 540901
8.3%
N 529082
8.2%
T 523289
8.1%
M 516207
8.0%
O 494032
7.6%
H 468544
7.2%
Other values (5) 98631
 
1.5%
Connector Punctuation
ValueCountFrequency (%)
_ 516207
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6483352
92.6%
Common 516207
 
7.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 1146688
17.7%
R 1012724
15.6%
I 595405
9.2%
E 557849
8.6%
C 540901
8.3%
N 529082
8.2%
T 523289
8.1%
M 516207
8.0%
O 494032
7.6%
H 468544
7.2%
Other values (5) 98631
 
1.5%
Common
ValueCountFrequency (%)
_ 516207
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6999559
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 1146688
16.4%
R 1012724
14.5%
I 595405
8.5%
E 557849
8.0%
C 540901
7.7%
N 529082
7.6%
T 523289
7.5%
_ 516207
7.4%
M 516207
7.4%
O 494032
7.1%
Other values (6) 567175
8.1%

publishedByGbifRegion
Text

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.5 MiB
2025-01-08T16:24:26.399517image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters9418604
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNORTH_AMERICA
2nd rowNORTH_AMERICA
3rd rowNORTH_AMERICA
4th rowNORTH_AMERICA
5th rowNORTH_AMERICA
ValueCountFrequency (%)
north_america 724508
100.0%
2025-01-08T16:24:26.507126image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
R 1449016
15.4%
A 1449016
15.4%
N 724508
7.7%
O 724508
7.7%
T 724508
7.7%
H 724508
7.7%
_ 724508
7.7%
M 724508
7.7%
E 724508
7.7%
I 724508
7.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 8694096
92.3%
Connector Punctuation 724508
 
7.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 1449016
16.7%
A 1449016
16.7%
N 724508
8.3%
O 724508
8.3%
T 724508
8.3%
H 724508
8.3%
M 724508
8.3%
E 724508
8.3%
I 724508
8.3%
C 724508
8.3%
Connector Punctuation
ValueCountFrequency (%)
_ 724508
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8694096
92.3%
Common 724508
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 1449016
16.7%
A 1449016
16.7%
N 724508
8.3%
O 724508
8.3%
T 724508
8.3%
H 724508
8.3%
M 724508
8.3%
E 724508
8.3%
I 724508
8.3%
C 724508
8.3%
Common
ValueCountFrequency (%)
_ 724508
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9418604
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R 1449016
15.4%
A 1449016
15.4%
N 724508
7.7%
O 724508
7.7%
T 724508
7.7%
H 724508
7.7%
_ 724508
7.7%
M 724508
7.7%
E 724508
7.7%
I 724508
7.7%

level0Gid
Text

Missing 

Distinct88
Distinct (%)0.2%
Missing686240
Missing (%)94.7%
Memory size5.5 MiB
2025-01-08T16:24:26.587415image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters114804
Distinct characters25
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)< 0.1%

Sample

1st rowUSA
2nd rowUSA
3rd rowUSA
4th rowUSA
5th rowUSA
ValueCountFrequency (%)
usa 33578
87.7%
mex 743
 
1.9%
can 398
 
1.0%
gum 255
 
0.7%
mnp 228
 
0.6%
pan 217
 
0.6%
idn 210
 
0.5%
umi 206
 
0.5%
fra 198
 
0.5%
pak 155
 
0.4%
Other values (78) 2080
 
5.4%
2025-01-08T16:24:26.720190image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 35038
30.5%
U 34448
30.0%
S 33870
29.5%
M 1679
 
1.5%
N 1312
 
1.1%
E 1296
 
1.1%
P 945
 
0.8%
I 802
 
0.7%
X 743
 
0.6%
R 715
 
0.6%
Other values (15) 3956
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 114804
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 35038
30.5%
U 34448
30.0%
S 33870
29.5%
M 1679
 
1.5%
N 1312
 
1.1%
E 1296
 
1.1%
P 945
 
0.8%
I 802
 
0.7%
X 743
 
0.6%
R 715
 
0.6%
Other values (15) 3956
 
3.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 114804
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 35038
30.5%
U 34448
30.0%
S 33870
29.5%
M 1679
 
1.5%
N 1312
 
1.1%
E 1296
 
1.1%
P 945
 
0.8%
I 802
 
0.7%
X 743
 
0.6%
R 715
 
0.6%
Other values (15) 3956
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 114804
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 35038
30.5%
U 34448
30.0%
S 33870
29.5%
M 1679
 
1.5%
N 1312
 
1.1%
E 1296
 
1.1%
P 945
 
0.8%
I 802
 
0.7%
X 743
 
0.6%
R 715
 
0.6%
Other values (15) 3956
 
3.4%

level0Name
Text

Missing 

Distinct88
Distinct (%)0.2%
Missing686240
Missing (%)94.7%
Memory size5.5 MiB
2025-01-08T16:24:26.817304image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length32
Median length13
Mean length12.50533082
Min length4

Characters and Unicode

Total characters478554
Distinct characters53
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)< 0.1%

Sample

1st rowUnited States
2nd rowUnited States
3rd rowUnited States
4th rowUnited States
5th rowUnited States
ValueCountFrequency (%)
united 33879
46.0%
states 33784
45.9%
méxico 743
 
1.0%
canada 398
 
0.5%
islands 291
 
0.4%
guam 255
 
0.3%
northern 235
 
0.3%
mariana 228
 
0.3%
panama 217
 
0.3%
indonesia 210
 
0.3%
Other values (93) 3333
 
4.5%
2025-01-08T16:24:26.984434image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 102628
21.4%
e 69101
14.4%
a 39466
 
8.2%
n 37290
 
7.8%
i 37192
 
7.8%
35305
 
7.4%
s 35286
 
7.4%
d 35157
 
7.3%
S 34062
 
7.1%
U 33936
 
7.1%
Other values (43) 19131
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 369454
77.2%
Uppercase Letter 73623
 
15.4%
Space Separator 35305
 
7.4%
Other Punctuation 172
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 102628
27.8%
e 69101
18.7%
a 39466
 
10.7%
n 37290
 
10.1%
i 37192
 
10.1%
s 35286
 
9.6%
d 35157
 
9.5%
o 2317
 
0.6%
r 1965
 
0.5%
c 1490
 
0.4%
Other values (16) 7562
 
2.0%
Uppercase Letter
ValueCountFrequency (%)
S 34062
46.3%
U 33936
46.1%
M 1323
 
1.8%
I 913
 
1.2%
P 587
 
0.8%
C 586
 
0.8%
G 328
 
0.4%
N 301
 
0.4%
E 210
 
0.3%
O 206
 
0.3%
Other values (13) 1171
 
1.6%
Other Punctuation
ValueCountFrequency (%)
. 114
66.3%
, 57
33.1%
' 1
 
0.6%
Space Separator
ValueCountFrequency (%)
35305
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 443077
92.6%
Common 35477
 
7.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 102628
23.2%
e 69101
15.6%
a 39466
 
8.9%
n 37290
 
8.4%
i 37192
 
8.4%
s 35286
 
8.0%
d 35157
 
7.9%
S 34062
 
7.7%
U 33936
 
7.7%
o 2317
 
0.5%
Other values (39) 16642
 
3.8%
Common
ValueCountFrequency (%)
35305
99.5%
. 114
 
0.3%
, 57
 
0.2%
' 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 477810
99.8%
None 744
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 102628
21.5%
e 69101
14.5%
a 39466
 
8.3%
n 37290
 
7.8%
i 37192
 
7.8%
35305
 
7.4%
s 35286
 
7.4%
d 35157
 
7.4%
S 34062
 
7.1%
U 33936
 
7.1%
Other values (41) 18387
 
3.8%
None
ValueCountFrequency (%)
é 743
99.9%
ô 1
 
0.1%

level1Gid
Text

Missing 

Distinct353
Distinct (%)0.9%
Missing686243
Missing (%)94.7%
Memory size5.5 MiB
2025-01-08T16:24:27.185955image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length8
Median length8
Mean length7.803449628
Min length7

Characters and Unicode

Total characters298599
Distinct characters37
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique98 ?
Unique (%)0.3%

Sample

1st rowUSA.10_1
2nd rowUSA.29_1
3rd rowUSA.2_1
4th rowUSA.44_1
5th rowUSA.38_1
ValueCountFrequency (%)
usa.44_1 3802
 
9.9%
usa.38_1 2959
 
7.7%
usa.23_1 2129
 
5.6%
usa.34_1 2095
 
5.5%
usa.10_1 1141
 
3.0%
usa.17_1 1123
 
2.9%
usa.32_1 1117
 
2.9%
usa.18_1 1042
 
2.7%
usa.1_1 1017
 
2.7%
usa.2_1 983
 
2.6%
Other values (343) 20857
54.5%
2025-01-08T16:24:27.445416image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 48709
16.3%
. 38265
12.8%
_ 38265
12.8%
A 35032
11.7%
U 34448
11.5%
S 33870
11.3%
4 15952
 
5.3%
3 14650
 
4.9%
2 8837
 
3.0%
8 5433
 
1.8%
Other values (27) 25138
8.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 114795
38.4%
Decimal Number 107274
35.9%
Other Punctuation 38265
 
12.8%
Connector Punctuation 38265
 
12.8%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 35032
30.5%
U 34448
30.0%
S 33870
29.5%
M 1679
 
1.5%
N 1312
 
1.1%
E 1296
 
1.1%
P 945
 
0.8%
I 802
 
0.7%
X 743
 
0.6%
R 715
 
0.6%
Other values (15) 3953
 
3.4%
Decimal Number
ValueCountFrequency (%)
1 48709
45.4%
4 15952
 
14.9%
3 14650
 
13.7%
2 8837
 
8.2%
8 5433
 
5.1%
5 3413
 
3.2%
7 3029
 
2.8%
6 2776
 
2.6%
0 2416
 
2.3%
9 2059
 
1.9%
Other Punctuation
ValueCountFrequency (%)
. 38265
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 38265
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 183804
61.6%
Latin 114795
38.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 35032
30.5%
U 34448
30.0%
S 33870
29.5%
M 1679
 
1.5%
N 1312
 
1.1%
E 1296
 
1.1%
P 945
 
0.8%
I 802
 
0.7%
X 743
 
0.6%
R 715
 
0.6%
Other values (15) 3953
 
3.4%
Common
ValueCountFrequency (%)
1 48709
26.5%
. 38265
20.8%
_ 38265
20.8%
4 15952
 
8.7%
3 14650
 
8.0%
2 8837
 
4.8%
8 5433
 
3.0%
5 3413
 
1.9%
7 3029
 
1.6%
6 2776
 
1.5%
Other values (2) 4475
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 298599
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 48709
16.3%
. 38265
12.8%
_ 38265
12.8%
A 35032
11.7%
U 34448
11.5%
S 33870
11.3%
4 15952
 
5.3%
3 14650
 
4.9%
2 8837
 
3.0%
8 5433
 
1.8%
Other values (27) 25138
8.4%

level1Name
Text

Missing 

Distinct353
Distinct (%)0.9%
Missing686243
Missing (%)94.7%
Memory size5.5 MiB
2025-01-08T16:24:27.639227image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length32
Median length29
Mean length8.062981837
Min length3

Characters and Unicode

Total characters308530
Distinct characters81
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique98 ?
Unique (%)0.3%

Sample

1st rowFlorida
2nd rowNevada
3rd rowAlaska
4th rowTexas
5th rowOregon
ValueCountFrequency (%)
texas 3802
 
8.4%
oregon 2959
 
6.5%
carolina 2734
 
6.0%
new 2376
 
5.2%
michigan 2129
 
4.7%
north 2102
 
4.6%
florida 1141
 
2.5%
kansas 1123
 
2.5%
mexico 1117
 
2.5%
kentucky 1042
 
2.3%
Other values (409) 24889
54.8%
2025-01-08T16:24:27.894518image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 41520
 
13.5%
i 26049
 
8.4%
o 23918
 
7.8%
n 23868
 
7.7%
e 19743
 
6.4%
r 18679
 
6.1%
s 17076
 
5.5%
l 13026
 
4.2%
h 9794
 
3.2%
t 9186
 
3.0%
Other values (71) 105671
34.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 254467
82.5%
Uppercase Letter 45849
 
14.9%
Space Separator 7149
 
2.3%
Dash Punctuation 932
 
0.3%
Other Punctuation 132
 
< 0.1%
Modifier Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 41520
16.3%
i 26049
10.2%
o 23918
9.4%
n 23868
9.4%
e 19743
 
7.8%
r 18679
 
7.3%
s 17076
 
6.7%
l 13026
 
5.1%
h 9794
 
3.8%
t 9186
 
3.6%
Other values (37) 51608
20.3%
Uppercase Letter
ValueCountFrequency (%)
M 6252
13.6%
N 5334
11.6%
C 5061
11.0%
O 4909
10.7%
T 4580
10.0%
A 3206
 
7.0%
K 2441
 
5.3%
W 2083
 
4.5%
V 1707
 
3.7%
S 1669
 
3.6%
Other values (19) 8607
18.8%
Other Punctuation
ValueCountFrequency (%)
' 130
98.5%
/ 2
 
1.5%
Space Separator
ValueCountFrequency (%)
7149
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 932
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 300316
97.3%
Common 8214
 
2.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 41520
13.8%
i 26049
 
8.7%
o 23918
 
8.0%
n 23868
 
7.9%
e 19743
 
6.6%
r 18679
 
6.2%
s 17076
 
5.7%
l 13026
 
4.3%
h 9794
 
3.3%
t 9186
 
3.1%
Other values (66) 97457
32.5%
Common
ValueCountFrequency (%)
7149
87.0%
- 932
 
11.3%
' 130
 
1.6%
/ 2
 
< 0.1%
` 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 307555
99.7%
None 975
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 41520
 
13.5%
i 26049
 
8.5%
o 23918
 
7.8%
n 23868
 
7.8%
e 19743
 
6.4%
r 18679
 
6.1%
s 17076
 
5.6%
l 13026
 
4.2%
h 9794
 
3.2%
t 9186
 
3.0%
Other values (46) 104696
34.0%
None
ValueCountFrequency (%)
ó 226
23.2%
á 167
17.1%
é 142
14.6%
ý 124
12.7%
í 96
9.8%
ñ 52
 
5.3%
Î 36
 
3.7%
š 25
 
2.6%
ô 21
 
2.2%
ę 15
 
1.5%
Other values (15) 71
 
7.3%

level2Gid
Text

Missing 

Distinct1562
Distinct (%)4.2%
Missing687320
Missing (%)94.9%
Memory size5.5 MiB
2025-01-08T16:24:28.102230image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length12
Median length11
Mean length10.68914704
Min length9

Characters and Unicode

Total characters397508
Distinct characters37
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique384 ?
Unique (%)1.0%

Sample

1st rowUSA.10.3_1
2nd rowUSA.29.10_1
3rd rowUSA.2.17_1
4th rowUSA.44.57_1
5th rowUSA.38.21_1
ValueCountFrequency (%)
usa.23.44_1 1758
 
4.7%
usa.38.21_1 1751
 
4.7%
mex.30.91_2 673
 
1.8%
usa.36.44_1 428
 
1.2%
usa.8.2_1 412
 
1.1%
usa.41.8_1 377
 
1.0%
usa.2.17_1 366
 
1.0%
usa.44.22_1 329
 
0.9%
usa.44.252_1 321
 
0.9%
usa.32.31_1 307
 
0.8%
Other values (1552) 30466
81.9%
2025-01-08T16:24:28.370492image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 74376
18.7%
1 61042
15.4%
_ 37188
9.4%
A 34984
8.8%
U 33973
8.5%
S 33842
8.5%
4 26872
 
6.8%
2 21885
 
5.5%
3 20338
 
5.1%
8 8756
 
2.2%
Other values (27) 44252
11.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 174380
43.9%
Uppercase Letter 111564
28.1%
Other Punctuation 74376
18.7%
Connector Punctuation 37188
 
9.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 34984
31.4%
U 33973
30.5%
S 33842
30.3%
E 1269
 
1.1%
N 1077
 
1.0%
M 923
 
0.8%
X 743
 
0.7%
C 674
 
0.6%
P 593
 
0.5%
R 571
 
0.5%
Other values (15) 2915
 
2.6%
Decimal Number
ValueCountFrequency (%)
1 61042
35.0%
4 26872
15.4%
2 21885
 
12.6%
3 20338
 
11.7%
8 8756
 
5.0%
5 8670
 
5.0%
7 7987
 
4.6%
6 7073
 
4.1%
9 6011
 
3.4%
0 5746
 
3.3%
Other Punctuation
ValueCountFrequency (%)
. 74376
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 37188
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 285944
71.9%
Latin 111564
 
28.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 34984
31.4%
U 33973
30.5%
S 33842
30.3%
E 1269
 
1.1%
N 1077
 
1.0%
M 923
 
0.8%
X 743
 
0.7%
C 674
 
0.6%
P 593
 
0.5%
R 571
 
0.5%
Other values (15) 2915
 
2.6%
Common
ValueCountFrequency (%)
. 74376
26.0%
1 61042
21.3%
_ 37188
13.0%
4 26872
 
9.4%
2 21885
 
7.7%
3 20338
 
7.1%
8 8756
 
3.1%
5 8670
 
3.0%
7 7987
 
2.8%
6 7073
 
2.5%
Other values (2) 11757
 
4.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 397508
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 74376
18.7%
1 61042
15.4%
_ 37188
9.4%
A 34984
8.8%
U 33973
8.5%
S 33842
8.5%
4 26872
 
6.8%
2 21885
 
5.5%
3 20338
 
5.1%
8 8756
 
2.2%
Other values (27) 44252
11.1%

level2Name
Text

Missing 

Distinct1254
Distinct (%)3.4%
Missing687320
Missing (%)94.9%
Memory size5.5 MiB
2025-01-08T16:24:28.549637image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length32
Median length25
Mean length7.870119393
Min length3

Characters and Unicode

Total characters292674
Distinct characters85
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique303 ?
Unique (%)0.8%

Sample

1st rowBay
2nd rowLincoln
3rd rowNorth Slope
4th rowDallas
5th rowLincoln
ValueCountFrequency (%)
lake 3351
 
7.3%
hurron 1795
 
3.9%
lincoln 1776
 
3.9%
superior 694
 
1.5%
jesús 673
 
1.5%
carranza 673
 
1.5%
washington 612
 
1.3%
new 537
 
1.2%
san 534
 
1.2%
erie 465
 
1.0%
Other values (1364) 34807
75.8%
2025-01-08T16:24:28.790726image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 29477
 
10.1%
e 27848
 
9.5%
r 23544
 
8.0%
n 23216
 
7.9%
o 21878
 
7.5%
l 16497
 
5.6%
i 15387
 
5.3%
t 11258
 
3.8%
s 11244
 
3.8%
u 8940
 
3.1%
Other values (75) 103385
35.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 236559
80.8%
Uppercase Letter 46425
 
15.9%
Space Separator 8729
 
3.0%
Dash Punctuation 537
 
0.2%
Other Punctuation 402
 
0.1%
Open Punctuation 10
 
< 0.1%
Close Punctuation 10
 
< 0.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 29477
12.5%
e 27848
11.8%
r 23544
10.0%
n 23216
9.8%
o 21878
9.2%
l 16497
 
7.0%
i 15387
 
6.5%
t 11258
 
4.8%
s 11244
 
4.8%
u 8940
 
3.8%
Other values (38) 47270
20.0%
Uppercase Letter
ValueCountFrequency (%)
L 7061
15.2%
C 6706
14.4%
S 4155
 
8.9%
B 3487
 
7.5%
H 3264
 
7.0%
M 2112
 
4.5%
P 2058
 
4.4%
W 2005
 
4.3%
T 1666
 
3.6%
D 1665
 
3.6%
Other values (17) 12246
26.4%
Other Punctuation
ValueCountFrequency (%)
' 333
82.8%
/ 47
 
11.7%
. 21
 
5.2%
, 1
 
0.2%
Decimal Number
ValueCountFrequency (%)
2 1
50.0%
3 1
50.0%
Space Separator
ValueCountFrequency (%)
8729
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 537
100.0%
Open Punctuation
ValueCountFrequency (%)
( 10
100.0%
Close Punctuation
ValueCountFrequency (%)
) 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 282984
96.7%
Common 9690
 
3.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 29477
 
10.4%
e 27848
 
9.8%
r 23544
 
8.3%
n 23216
 
8.2%
o 21878
 
7.7%
l 16497
 
5.8%
i 15387
 
5.4%
t 11258
 
4.0%
s 11244
 
4.0%
u 8940
 
3.2%
Other values (65) 93695
33.1%
Common
ValueCountFrequency (%)
8729
90.1%
- 537
 
5.5%
' 333
 
3.4%
/ 47
 
0.5%
. 21
 
0.2%
( 10
 
0.1%
) 10
 
0.1%
2 1
 
< 0.1%
3 1
 
< 0.1%
, 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 291303
99.5%
None 1371
 
0.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 29477
 
10.1%
e 27848
 
9.6%
r 23544
 
8.1%
n 23216
 
8.0%
o 21878
 
7.5%
l 16497
 
5.7%
i 15387
 
5.3%
t 11258
 
3.9%
s 11244
 
3.9%
u 8940
 
3.1%
Other values (51) 102014
35.0%
None
ValueCountFrequency (%)
ú 673
49.1%
ó 328
23.9%
é 112
 
8.2%
í 101
 
7.4%
š 26
 
1.9%
á 25
 
1.8%
è 22
 
1.6%
ř 20
 
1.5%
ü 14
 
1.0%
ô 9
 
0.7%
Other values (14) 41
 
3.0%

level3Gid
Text

Missing 

Distinct340
Distinct (%)17.0%
Missing722506
Missing (%)99.7%
Memory size5.5 MiB
2025-01-08T16:24:28.947753image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length14
Median length13
Mean length11.82917083
Min length11

Characters and Unicode

Total characters23682
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique159 ?
Unique (%)7.9%

Sample

1st rowIDN.34.7.16_1
2nd rowMMR.7.4.6_1
3rd rowPOL.15.20.6_1
4th rowPAK.7.8.3_1
5th rowESP.17.1.4_1
ValueCountFrequency (%)
pan.4.2.2_1 216
 
10.8%
idn.34.7.16_1 162
 
8.1%
ecu.9.2.2_1 82
 
4.1%
can.9.24.1_1 79
 
3.9%
pak.7.8.3_1 59
 
2.9%
can.8.1.2_1 56
 
2.8%
mar.4.2.10_1 41
 
2.0%
can.9.22.1_1 37
 
1.8%
can.9.32.5_1 30
 
1.5%
can.9.23.1_1 30
 
1.5%
Other values (330) 1210
60.4%
2025-01-08T16:24:29.167485image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 6006
25.4%
1 3975
16.8%
_ 2002
 
8.5%
2 1598
 
6.7%
A 1231
 
5.2%
4 921
 
3.9%
N 838
 
3.5%
3 797
 
3.4%
P 565
 
2.4%
C 518
 
2.2%
Other values (23) 5231
22.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9668
40.8%
Other Punctuation 6006
25.4%
Uppercase Letter 6006
25.4%
Connector Punctuation 2002
 
8.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 1231
20.5%
N 838
14.0%
P 565
9.4%
C 518
8.6%
R 411
 
6.8%
I 371
 
6.2%
E 307
 
5.1%
D 302
 
5.0%
F 198
 
3.3%
T 165
 
2.7%
Other values (11) 1100
18.3%
Decimal Number
ValueCountFrequency (%)
1 3975
41.1%
2 1598
16.5%
4 921
 
9.5%
3 797
 
8.2%
9 485
 
5.0%
7 479
 
5.0%
5 449
 
4.6%
6 405
 
4.2%
8 337
 
3.5%
0 222
 
2.3%
Other Punctuation
ValueCountFrequency (%)
. 6006
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2002
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 17676
74.6%
Latin 6006
 
25.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 1231
20.5%
N 838
14.0%
P 565
9.4%
C 518
8.6%
R 411
 
6.8%
I 371
 
6.2%
E 307
 
5.1%
D 302
 
5.0%
F 198
 
3.3%
T 165
 
2.7%
Other values (11) 1100
18.3%
Common
ValueCountFrequency (%)
. 6006
34.0%
1 3975
22.5%
_ 2002
 
11.3%
2 1598
 
9.0%
4 921
 
5.2%
3 797
 
4.5%
9 485
 
2.7%
7 479
 
2.7%
5 449
 
2.5%
6 405
 
2.3%
Other values (2) 559
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 23682
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 6006
25.4%
1 3975
16.8%
_ 2002
 
8.5%
2 1598
 
6.7%
A 1231
 
5.2%
4 921
 
3.9%
N 838
 
3.5%
3 797
 
3.4%
P 565
 
2.4%
C 518
 
2.2%
Other values (23) 5231
22.1%

level3Name
Text

Missing 

Distinct340
Distinct (%)17.0%
Missing722506
Missing (%)99.7%
Memory size5.5 MiB
2025-01-08T16:24:29.348644image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length32
Median length24
Mean length11.58741259
Min length3

Characters and Unicode

Total characters23198
Distinct characters79
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique159 ?
Unique (%)7.9%

Sample

1st rowSangkulirang
2nd rowThayet
3rd rowRaszków
4th rowMianwali
5th rown.a. (108)
ValueCountFrequency (%)
barrio 216
 
6.2%
sur 216
 
6.2%
lake 172
 
5.0%
sangkulirang 162
 
4.7%
santa 84
 
2.4%
cab 82
 
2.4%
n.a 82
 
2.4%
floreana 82
 
2.4%
isla 82
 
2.4%
mara 82
 
2.4%
Other values (426) 2205
63.6%
2025-01-08T16:24:29.596395image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 3242
 
14.0%
r 2007
 
8.7%
n 1526
 
6.6%
i 1489
 
6.4%
1463
 
6.3%
e 1431
 
6.2%
o 1107
 
4.8%
u 917
 
4.0%
l 907
 
3.9%
S 747
 
3.2%
Other values (69) 8362
36.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 17193
74.1%
Uppercase Letter 3353
 
14.5%
Space Separator 1463
 
6.3%
Other Punctuation 378
 
1.6%
Open Punctuation 286
 
1.2%
Decimal Number 254
 
1.1%
Close Punctuation 204
 
0.9%
Dash Punctuation 67
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 3242
18.9%
r 2007
11.7%
n 1526
8.9%
i 1489
8.7%
e 1431
8.3%
o 1107
 
6.4%
u 917
 
5.3%
l 907
 
5.3%
t 616
 
3.6%
g 594
 
3.5%
Other values (26) 3357
19.5%
Uppercase Letter
ValueCountFrequency (%)
S 747
22.3%
B 433
12.9%
L 308
 
9.2%
M 229
 
6.8%
C 190
 
5.7%
F 169
 
5.0%
I 144
 
4.3%
K 137
 
4.1%
A 111
 
3.3%
D 105
 
3.1%
Other values (15) 780
23.3%
Decimal Number
ValueCountFrequency (%)
2 65
25.6%
1 52
20.5%
6 31
12.2%
8 25
 
9.8%
0 19
 
7.5%
3 19
 
7.5%
7 17
 
6.7%
5 12
 
4.7%
4 8
 
3.1%
9 6
 
2.4%
Other Punctuation
ValueCountFrequency (%)
. 277
73.3%
, 73
 
19.3%
' 24
 
6.3%
/ 4
 
1.1%
Space Separator
ValueCountFrequency (%)
1463
100.0%
Open Punctuation
ValueCountFrequency (%)
( 286
100.0%
Close Punctuation
ValueCountFrequency (%)
) 204
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 67
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 20546
88.6%
Common 2652
 
11.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 3242
15.8%
r 2007
 
9.8%
n 1526
 
7.4%
i 1489
 
7.2%
e 1431
 
7.0%
o 1107
 
5.4%
u 917
 
4.5%
l 907
 
4.4%
S 747
 
3.6%
t 616
 
3.0%
Other values (51) 6557
31.9%
Common
ValueCountFrequency (%)
1463
55.2%
( 286
 
10.8%
. 277
 
10.4%
) 204
 
7.7%
, 73
 
2.8%
- 67
 
2.5%
2 65
 
2.5%
1 52
 
2.0%
6 31
 
1.2%
8 25
 
0.9%
Other values (8) 109
 
4.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 23097
99.6%
None 101
 
0.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 3242
 
14.0%
r 2007
 
8.7%
n 1526
 
6.6%
i 1489
 
6.4%
1463
 
6.3%
e 1431
 
6.2%
o 1107
 
4.8%
u 917
 
4.0%
l 907
 
3.9%
S 747
 
3.2%
Other values (58) 8261
35.8%
None
ValueCountFrequency (%)
é 28
27.7%
è 21
20.8%
É 19
18.8%
ü 9
 
8.9%
ó 8
 
7.9%
í 7
 
6.9%
á 3
 
3.0%
ę 2
 
2.0%
ö 2
 
2.0%
ê 1
 
1.0%

iucnRedListCategory
Text

Missing 

Distinct9
Distinct (%)< 0.1%
Missing365809
Missing (%)50.5%
Memory size5.5 MiB
2025-01-08T16:24:29.653106image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters717398
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLC
2nd rowNE
3rd rowNE
4th rowNE
5th rowNE
ValueCountFrequency (%)
ne 340013
94.8%
lc 7458
 
2.1%
cr 3457
 
1.0%
vu 3162
 
0.9%
en 2012
 
0.6%
ex 1761
 
0.5%
nt 761
 
0.2%
dd 73
 
< 0.1%
ew 2
 
< 0.1%
2025-01-08T16:24:29.755807image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 343788
47.9%
N 342786
47.8%
C 10915
 
1.5%
L 7458
 
1.0%
R 3457
 
0.5%
V 3162
 
0.4%
U 3162
 
0.4%
X 1761
 
0.2%
T 761
 
0.1%
D 146
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 717398
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 343788
47.9%
N 342786
47.8%
C 10915
 
1.5%
L 7458
 
1.0%
R 3457
 
0.5%
V 3162
 
0.4%
U 3162
 
0.4%
X 1761
 
0.2%
T 761
 
0.1%
D 146
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 717398
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 343788
47.9%
N 342786
47.8%
C 10915
 
1.5%
L 7458
 
1.0%
R 3457
 
0.5%
V 3162
 
0.4%
U 3162
 
0.4%
X 1761
 
0.2%
T 761
 
0.1%
D 146
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 717398
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 343788
47.9%
N 342786
47.8%
C 10915
 
1.5%
L 7458
 
1.0%
R 3457
 
0.5%
V 3162
 
0.4%
U 3162
 
0.4%
X 1761
 
0.2%
T 761
 
0.1%
D 146
 
< 0.1%